共查询到20条相似文献,搜索用时 15 毫秒
1.
Extending the single optimized spaced seed of PatternHunter(20) to multiple ones, PatternHunter II simultaneously remedies the lack of sensitivity of Blastn and the lack of speed of Smith-Waterman, for homology search. At Blastn speed, PatternHunter II approaches Smith-Waterman sensitivity, bringing homology search methodology research back to a full circle. 相似文献
2.
SUMMARY: New ideas, spaced seeds and gapped alignment before 6-frame translation are implemented for translated homology search in tPatternHunter. The new software compares favorably with tBLASTx. AVAILABILITY: The software is free to academics at http://www.bioinformaticssolutions.com/downloads/ph-academic/ CONTACT: bma@csd.uwo.ca. 相似文献
3.
PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture. AVAILABILITY: PfamAlyzer has been integrated with the Pfam database and can be accessed at http://pfam.cgb.ki.se/pfamalyzer. 相似文献
4.
- Download : Download high-res image (190KB)
- Download : Download full-size image
5.
We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (<5% of the human genome), but is inappropriate when looking for repeats in the majority of genomic sequence where indels are common. In this paper, we assume a model of homologous sequence alignment which includes indels and we describe a new seed model, called indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds. 相似文献
6.
7.
In guppies (Poecilia reticulata) precopulatory sexual selection (via female choice) and post-copulatory selection (via sperm competition) both favour males with relatively high levels of carotenoid (orange) pigmentation, suggesting that colourful males produce more competitive ejaculates. Here we test whether there is a positive association between male orange pigmentation and sperm quality. Our analysis of sperm quality focused on sperm swimming speeds (using CASA: computer-assisted sperm analysis to estimate three parameters of sperm velocity in vitro), sperm viability (proportion of live sperm per stripped ejaculate) and sperm lengths. We found that males with relatively large areas of orange pigmentation had significantly faster and more viable sperm than their less ornamented counterparts, suggesting a possible link between dietary carotenoid intake and sperm quality. By contrast, we found no relationship between sperm length (head length and total sperm length) and male phenotype. These findings, in conjunction with previous work showing that highly ornamented male guppies sire higher quality offspring, suggest that female preference for colourful males and sperm competition work in concert to favour intrinsically higher quality males. 相似文献
8.
9.
MOTIVATION: Homology search for RNAs can use secondary structure information to increase power by modeling base pairs, as in covariance models, but the resulting computational costs are high. Typical acceleration strategies rely on at least one filtering stage using sequence-only search. RESULTS: Here we present the multi-segment CYK (MSCYK) filter, which implements a heuristic of ungapped structural alignment for RNA homology search. Compared to gapped alignment, this approximation has lower computation time requirements (O(N?) reduced to O(N3), and space requirements (O(N3) reduced to O(N2). A vector-parallel implementation of this method gives up to 100-fold speed-up; vector-parallel implementations of standard gapped alignment at two levels of precision give 3- and 6-fold speed-ups. These approaches are combined to create a filtering pipeline that scores RNA secondary structure at all stages, with results that are synergistic with existing methods. 相似文献
10.
MOTIVATION: Filtration is an important technique used to speed up local alignment as exemplified in the BLAST programs. Recently, Ma et al. discovered that better filtering can be achieved by spacing out the matching positions according to a certain pattern, instead of contiguous positions to trigger a local alignment in their PatternHunter program. Such a match pattern is called a spaced seed. RESULTS: Our numerical computation shows that the ranks of spaced seeds (based on sensitivity) change with the sequences similarity. Since homologous sequences may have diverse similarity, we assess the sensitivity of spaced seeds over a range of similarity levels and present a list of good spaced seeds for facilitating homology search in DNA genomic sequences. We validate that the listed spaced seeds are indeed more sensitive using three arbitrarily chosen pairs of DNA genomic sequences. 相似文献
11.
Brown DG 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(1):29-38
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP. 相似文献
12.
Zhang L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(3):496-505
In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of non-overlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that when the length of a non-uniformly spaced seed is bounded above by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed of the same weight in both 1) the average number of non-overlapping hits and 2) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds. 相似文献
13.
A new method for homology search of DNA sequences is suggested. This method may be used to find extensive and not strong homologies with point mutations and deletions. The running program time for comparing sequences is less then the dynamic program algorithms at least at two orders of magnitude. It makes possible to use the method for homology searching throughover the nucleotide bank by personal computers. 相似文献
14.
Background
Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice. 相似文献15.
Homologous recombination, an essential process for preserving genomic integrity, uses intact homologous sequences to repair broken chromosomes. To explore the mechanism of homologous pairing in vivo, we tagged two homologous loci in diploid yeast Saccharomyces cerevisiae cells and investigated their dynamic organization in the absence and presence of DNA damage. When neither locus is damaged, homologous loci occupy largely separate regions, exploring only 2.7% of the nuclear volume. Following the induction of a double-strand break, homologous loci co-localize ten times more often. The mobility of the cut chromosome markedly increases, allowing it to explore a nuclear volume that is more than ten times larger. Interestingly, the mobility of uncut chromosomes also increases, allowing them to explore a four times larger volume. We propose a model for homology search in which increased chromosome mobility facilitates homologous pairing. Finally, we find that the increase in DNA dynamics is dependent on early steps of homologous recombination. 相似文献
16.
Frith MC 《PloS one》2011,6(12):e28819
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. 相似文献
17.
Jaina Mistry Robert D. Finn Sean R. Eddy Alex Bateman Marco Punta 《Nucleic acids research》2013,41(12):e121
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13 000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias. 相似文献
18.
New human and mouse microRNA genes found by homology search 总被引:2,自引:0,他引:2
Weber MJ 《The FEBS journal》2005,272(1):59-73
19.
20.
Neurospora crassa and Humicola lanuginosa cytochromes c: more homology in the heme region 总被引:4,自引:0,他引:4
Neurospora crassa and Humicola lanuginosa cytochromes c were submitted to an automatic Edman degradation. It was found that residue 16 is a glutamine, as we had predicted (1) and not a glutamic acid, as published for both proteins (4,7). Moreover, residues 19 to 26 were found to have been placed in a wrong order in both cases. The corrected order shows more homology with other cytochromes c in this area. 相似文献