首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. We propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step - semigapped alignment - compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic - restricted insertion alignment - that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in blast. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.  相似文献   

2.
MOTIVATION: Many biological applications require the comparison of large genome strings. Current techniques suffer from high computational and I/O costs. RESULTS: We propose an efficient technique for local alignment of large genome strings. A space-efficient index is computed for one string, and the second string is compared with this index in order to prune substring pairs that do not contain similar regions. The remaining substring pairs are handed to a hash-table-based tool, such as BLAST, for alignment. A dynamic strategy is employed to optimize the number of disk seeks needed to access the hash table. Additionally, our technique provides the user with a coarse-grained visualization of the similarity pattern, quickly and before the actual search. The experimental results show that our technique aligns genome strings up to two orders of magnitude faster than BLAST. Our technique can be used to accelerate other search tools as well. AVAILABILITY: A web-based demo can be found at http://bioserver.cs.ucsb.edu/. Source code is available from the authors on request.  相似文献   

3.
SUMMARY: BLAST is a widely used alignment tool for detecting matches between a query sequence and entries in nucleotide sequence databases. Matches (high-scoring pairs, HSPs) are assigned a score based on alignment length and quality and, by default, are reported with the top-scoring matches listed first. For certain types of searches, however, this method of reporting is not optimal. This is particularly true when searching a genome sequence with a query that was derived from the same genome, or a closely related one. If the genome is complex and the assembly is far from complete, correct matches are often relegated to low positions in the results, where they may be easily overlooked. To rectify this problem, we developed TruMatch--a program that parses standard BLAST outputs and identifies HSPs that involve query segments with unique matches to the assembly. Candidates for bona fide matches between a query sequence and a genome assembly are listed at the top of the TruMatch output. AVAILABILITY: TruMatch is written in Perl and is freely available to non-commercial users via web download at the URL: http://genome.kbrin.uky.edu/fungi_tel/TruMatch/  相似文献   

4.
MOTIVATION: Two proteins can have a similar 3-dimensional structure and biological function, but have sequences sufficiently different that traditional protein sequence comparison algorithms do not identify their relationship. The desire to identify such relations has led to the development of more sensitive sequence alignment strategies. One such strategy is the Intermediate Sequence Search (ISS), which connects two proteins through one or more intermediate sequences. In its brute-force implementation, ISS is a strategy that repetitively uses the results of the previous query as new search seeds, making it time-consuming and difficult to analyze. RESULTS: Saturated BLAST is a package that performs ISS in an efficient and automated manner. It was developed using Perl and Perl/Tk and implemented on the LINUX operating system. Starting with a protein sequence, Saturated BLAST runs a BLAST search and identifies representative sequences for the next generation of searches. The procedure is run until convergence or until some predefined criteria are met. Saturated BLAST has a friendly graphic user interface, a built-in BLAST result parser, several multiple alignment tools, clustering algorithms and various filters for the elimination of false positives, thereby providing an easy way to edit, visualize, analyze, monitor and control the search. Besides detecting remote homologies, Saturated BLAST can be used to maintain protein family databases and to search for new genes in genomic databases.  相似文献   

5.
In this study the flanking sequences of 1534 horse microsatellites were used in a BLAST search to identify putative human-horse homologies. BLAST searches revealed 129 flanking sequences with significant blastn matches [alignment scores (S) > or = 60 and sum probability values (E) < or = 3.0E-6], also, 25 of these produced significant blastx matches. To provide a reference point in the human genome the flanking sequences with matches were subjected to a BLAT search of the University of California Santa Cruz (UCSC) human genome assembly (July 2003 freeze). Eighty-three of the flanking sequences showed high similarity to sequence of known or putative human genes and the remaining 46 demonstrated high similarity to human intragenic regions. Interestingly, 87 of the microsatellites showed conservation of the tandem repeat in addition to flanking regions. Overall, 41 of the microsatellites had been mapped in the horse and of these 37 localized to the expected syntenic location. The other four did not and represent new putative regions of human-horse synteny. The results of this study contribute 79 new putative human-horse homologies, increasing the density of markers on the human-horse comparative map.  相似文献   

6.
MOTIVATION: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. RESULTS: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene. AVAILABILITY: The Java implementation is available for download from http://www.bioinformatics.uwaterloo.ca/software.  相似文献   

7.
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.  相似文献   

8.
低浓度苯并[a]芘诱导赤子爱胜蚓HSP70和HSP90转录上调研究   总被引:1,自引:1,他引:0  
为寻找土壤低浓度多环芳烃污染分子生物标记物,采用了抑制消减双杂交的方法构建了赤子爱胜蚓在苯并[a]芘(BaP)人工土壤污染胁迫下的差异表达cDNA文库,经测序和基因比对分析后,在上调文库中分别发现2个与热休克蛋白HSP70和1个与HSP90显著匹配的cDNA克隆.经定量PCR验证了0。1 mg·kg-1和1。0 mg·kg-1 BaP对赤子爱胜蚓HSP70和HSP90的诱导作用,表明这两个新克隆到的赤子爱胜蚓热休克蛋白基因可作为土壤污染监测的备选分子生物标记物.  相似文献   

9.
Filtration techniques in the form of rapid elimination of candidate sequences while retaining the true one are key ingredients of database searches in genomics. Although SEQUEST and Mascot perform a conceptually similar task to the tool BLAST, the key algorithmic idea of BLAST (filtration) was never implemented in these tools. As a result MS/MS protein identification tools are becoming too time-consuming for many applications including search for post-translationally modified peptides. Moreover, matching millions of spectra against all known proteins will soon make these tools too slow in the same way that "genome vs genome" comparisons instantly made BLAST too slow. We describe the development of filters for MS/MS database searches that dramatically reduce the running time and effectively remove the bottlenecks in searching the huge space of protein modifications. Our approach, based on a probability model for determining the accuracy of sequence tags, achieves superior results compared to GutenTag, a popular tag generation algorithm. Our tag generating algorithm along with our de novo sequencing algorithm PepNovo can be accessed via the URL http://peptide.ucsd.edu/.  相似文献   

10.
The SYSTERS (short for SYSTEmatic Re-Searching) protein sequence cluster set consists of the classification of all sequences from SWISS-PROT and PIR into disjoint protein family clusters and hierarchically into superfamily and subfamily clusters. The cluster set can be searched with a sequence using the SSMAL search tool or a traditional database search tool like BLAST or FASTA. Additionally a multiple alignment is generated for each cluster and annotated with domain information from the Pfam database of protein domain families. A taxonomic overview of the organisms covered by a cluster is given based on the NCBI taxonomy. The cluster set is available for querying and browsing at http://www.dkfz-heidelberg. de/tbi/services/cluster/systersform  相似文献   

11.
12.
Liu Y  Li J 《Current microbiology》2011,62(3):770-776
The interaction between bacteria and human is still incomplete. With the recent availability of many microbial genomes and human genome, as well as the series of basic local alignment search tool (BLAST) programs, a new perspective to gain insight into the interaction between the bacteria and human is possible. This study is to determine the possibility of existence of sequence identity between the genomes of bacteria and human, and try to explain this phenomenon in term of bacteriophages and other genetic mobile elements. BLAST searches of the genomes of bacteria, bacteriophages, and plasmids against human genome were performed using the resources of the National Center for Biotechnology Information (NCBI). All studied bacteria contain variable numbers of short regions of sequence identity to the genome of human, which ranged from 27 to 84 nt. They were found at multiple sites within the human genome. The short regions of sequence identity existed between the genomes of bacteria and human, and a hypothesis that viruses, especially bacteriophages, might play a significant role in shaping the genomes of bacterial and human, and contribute to the short regions of sequence identity is developed.  相似文献   

13.
Basic Local Alignment Search Tool (BLAST) is a popular tool used for determining the patterns in genomic sequences. The algorithm of BLAST has gone for various changes from time to time. One third of the time is taken by BLAST to perform the gapped analysis on the sequences. An efficient algorithm has been presented that employs a new approach for curtailing the amount of sequences that proceed for gapped alignment. So this method will work after the ungapped alignment process is over. This works because of the fact that it is not necessary to perform gapped alignment for all the sequences that are coming from ungapped analysis. There is a significant increase in speed of the alignment process without compromising on the sensitivity of the result.  相似文献   

14.
Fifty-four new markers were developed to fill in gaps in the current map of canine microsatellites and to complement existing markers that may not be sufficiently informative in highly inbred canine pedigrees. Canine genes contained on the radiation hybrid map were used to obtain the sequence of the human homolog. A BLAST search versus the canine whole genome shotgun (wgs) sequence resource was used to obtain the sequence of the canine genomic contigs containing the homolog of the corresponding human gene. Canine sequences that contained microsatellites and mapped back to the correct location in the human genome were used to design primers for amplification of the microsatellites from canine genomic DNA. Heterozygosities of the markers were tested by genotyping grandparental DNAs obtained from the Nestle Purina Reference family DNA distribution center plus DNAs from unrelated Bouviers and Irish wolfhounds. Canine map positions of markers on the July 2004 freeze of the canine genome assembly were determined by in silico PCR or BLAST.  相似文献   

15.
MOTIVATION: Multiple sequence alignment is an important tool in computational biology. In order to solve the task of computing multiple alignments in affordable time, the most commonly used multiple alignment methods have to use heuristics. Nevertheless, the computation of optimal multiple alignments is important in its own right, and it provides a means of evaluating heuristic approaches or serves as a subprocedure of heuristic alignment methods. RESULTS: We present an algorithm that uses the divide-and-conquer alignment approach together with recent results on search space reduction to speed up the computation of multiple sequence alignments. The method is adaptive in that depending on the time one wants to spend on the alignment, a better, up to optimal alignment can be obtained. To speed up the computation in the optimal alignment step, we apply the alpha(*) algorithm which leads to a procedure provably more efficient than previous exact algorithms. We also describe our implementation of the algorithm and present results showing the effectiveness and limitations of the procedure.  相似文献   

16.
In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases  相似文献   

17.
18.
Kim S  Kang J  Chung YJ  Li J  Ryu KH 《Proteins》2008,71(3):1113-1122
The quality of orthologous protein clusters (OPCs) is largely dependent on the results of the reciprocal BLAST (basic local alignment search tool) hits among genomes. The BLAST algorithm is very efficient and fast, but it is very difficult to get optimal solution among phylogenetically distant species because the genomes with large evolutionary distance typically have low similarity in their protein sequences. To reduce the false positives in the OPCs, thresholding is often employed on the BLAST scores. However, the thresholding also eliminates large numbers of true positives as the orthologs from distant species likely have low BLAST scores. To rectify this problem, we introduce a new hybrid method combining the Recursive and the Markov CLuster (MCL) algorithms without using the BLAST thresholding. In the first step, we use InParanoid to produce n(n-1)/2 ortholog tables from n genomes. After combining all the tables into one, our clustering algorithm clusters ortholog pairs recursively in the table. Then, our method employs MCL algorithm to compute the clusters and refines the clusters by adjusting the inflation factor. We tested our method using six different genomes and evaluated the results by comparing against Kegg Orthology (KO) OPCs, which are generated from manually curated pathways. To quantify the accuracy of the results, we introduced a new intuitive similarity measure based on our Least-move algorithm that computes the consistency between two OPCs. We compared the resulting OPCs with the KO OPCs using this measure. We also evaluated the performance of our method using InParanoid as the baseline approach. The experimental results show that, at the inflation factor 1.3, we produced 54% more orthologs than InParanoid sacrificing a little less accuracy (1.7% less) than InParanoid, and at the factor 1.4, produced not only 15% more orthologs than InParanoid but also a higher accuracy (1.4% more) than InParanoid.  相似文献   

19.
Basic local alignment search tool   总被引:1594,自引:0,他引:1594  
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号