首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds.  相似文献   

2.
PatternHunter: faster and more sensitive homology search   总被引:15,自引:0,他引:15  
MOTIVATION: Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation. RESULTS: We present a new homology search algorithm 'PatternHunter' that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. At Blast levels of sensitivity, PatternHunter is able to find homologies between sequences as large as human chromosomes, in mere hours on a desktop. AVAILABILITY: PatternHunter is available at http://www.bioinformaticssolutions.com, as a commercial package. It runs on all platforms that support Java. PatternHunter technology is being patented; commercial use requires a license from BSI, while non-commercial use will be free.  相似文献   

3.
4.
MOTIVATION: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma et al. discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. RESULTS: Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences.  相似文献   

5.
Extending the single optimized spaced seed of PatternHunter(20) to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search methodology research back to a full circle.  相似文献   

6.
7.
MOTIVATION: Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smith-Waterman sensitivity is approached at BLASTn speed. However, computing optimal multiple spaced seeds was proved to be NP-hard and current heuristic algorithms are all very slow (exponential). RESULTS: We give a simple algorithm which computes good multiple seeds in polynomial time. Due to a completely different approach, the difference with respect to the previous methods is dramatic. The multiple spaced seed of PatternHunterII, with 16 weight 11 seeds, was computed in 12 days. It takes us 17 s to find a better one. Our approach changes the way of looking at multiple spaced seeds.  相似文献   

8.
MOTIVATION: Homology search for RNAs can use secondary structure information to increase power by modeling base pairs, as in covariance models, but the resulting computational costs are high. Typical acceleration strategies rely on at least one filtering stage using sequence-only search. RESULTS: Here we present the multi-segment CYK (MSCYK) filter, which implements a heuristic of ungapped structural alignment for RNA homology search. Compared to gapped alignment, this approximation has lower computation time requirements (O(N?) reduced to O(N3), and space requirements (O(N3) reduced to O(N2). A vector-parallel implementation of this method gives up to 100-fold speed-up; vector-parallel implementations of standard gapped alignment at two levels of precision give 3- and 6-fold speed-ups. These approaches are combined to create a filtering pipeline that scores RNA secondary structure at all stages, with results that are synergistic with existing methods.  相似文献   

9.
SUMMARY: New ideas, spaced seeds and gapped alignment before 6-frame translation are implemented for translated homology search in tPatternHunter. The new software compares favorably with tBLASTx. AVAILABILITY: The software is free to academics at http://www.bioinformaticssolutions.com/downloads/ph-academic/ CONTACT: bma@csd.uwo.ca.  相似文献   

10.
In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of non-overlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that when the length of a non-uniformly spaced seed is bounded above by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed of the same weight in both 1) the average number of non-overlapping hits and 2) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds.  相似文献   

11.
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.  相似文献   

12.
A new method for homology search of DNA sequences is suggested. This method may be used to find extensive and not strong homologies with point mutations and deletions. The running program time for comparing sequences is less then the dynamic program algorithms at least at two orders of magnitude. It makes possible to use the method for homology searching throughover the nucleotide bank by personal computers.  相似文献   

13.
Homologous recombination, an essential process for preserving genomic integrity, uses intact homologous sequences to repair broken chromosomes. To explore the mechanism of homologous pairing in vivo, we tagged two homologous loci in diploid yeast Saccharomyces cerevisiae cells and investigated their dynamic organization in the absence and presence of DNA damage. When neither locus is damaged, homologous loci occupy largely separate regions, exploring only 2.7% of the nuclear volume. Following the induction of a double-strand break, homologous loci co-localize ten times more often. The mobility of the cut chromosome markedly increases, allowing it to explore a nuclear volume that is more than ten times larger. Interestingly, the mobility of uncut chromosomes also increases, allowing them to explore a four times larger volume. We propose a model for homology search in which increased chromosome mobility facilitates homologous pairing. Finally, we find that the increase in DNA dynamics is dependent on early steps of homologous recombination.  相似文献   

14.
Frith MC 《PloS one》2011,6(12):e28819
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.  相似文献   

15.
16.
17.
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13 000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.  相似文献   

18.
Savir Y  Tlusty T 《Molecular cell》2010,40(3):388-396
Homologous recombination facilitates the exchange of genetic material between homologous DNA molecules. This crucial process requires detecting a specific homologous DNA sequence within a huge variety of heterologous sequences. The detection is mediated by RecA in E. coli, or members of its superfamily in other organisms. Here, we examine how well the RecA-DNA interaction is adjusted to its task. By formulating the DNA recognition process as a signal detection problem, we find the optimal value of binding energy that maximizes the ability to detect homologous sequences. We show that the experimentally observed binding energy is nearly optimal. This implies that the RecA-induced deformation and the binding energetics are fine-tuned to ensure optimal sequence detection. Our analysis suggests a possible role for DNA extension by RecA, in which deformation enhances detection. The present signal detection approach provides a general recipe for testing the optimality of other molecular recognition systems.  相似文献   

19.
Bacillus subtilis RecO plays a central role in recombinational repair and genetic recombination by (i) stimulating RecA filamentation onto SsbA-coated single-stranded (ss) DNA, (ii) modulating the extent of RecA-mediated DNA strand exchange and (iii) promoting annealing of complementary DNA strands. Here, we report that RecO-mediated strand annealing is facilitated by cognate SsbA, but not by a heterologous one. Analysis of non-productive intermediates reveals that RecO interacts with SsbA-coated ssDNA, resulting in transient ternary complexes. The self-interaction of ternary complexes via RecO led to the formation of large nucleoprotein complexes. In the presence of homology, SsbA, at the nucleoprotein, removes DNA secondary structures, inhibits spontaneous strand annealing and facilitates RecO loading onto SsbA–ssDNA complex. RecO relieves SsbA inhibition of strand annealing and facilitates transient and random interactions between homologous naked ssDNA molecules. Finally, both proteins lose affinity for duplex DNA. Our results provide a mechanistic framework for rationalizing protein release and dsDNA zippering as coordinated events that are crucial for RecA-independent plasmid transformation.  相似文献   

20.
New human and mouse microRNA genes found by homology search   总被引:2,自引:0,他引:2  
Weber MJ 《The FEBS journal》2005,272(1):59-73
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号