首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions. RESULTS: A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith-Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions. AVAILABILITY: CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/  相似文献   

2.
A new method of the homology search between DNA sequences was suggested. This method may be used to find extensive and not strong homologies with point mutations and deletions. The computer time to compare sequences is less than dynamic program algorithms at least by four orders of magnitude. It makes possible to use the method for homology search all over the nucleotide bank by personal computers. Some results of homology search are presented.  相似文献   

3.
Biological sequences are often analyzed by detecting homologous regions between them. Homology search is confounded by simple repeats, which give rise to strong similarities that are not homologies. Standard repeat-masking methods fail to eliminate this problem, and they are especially ill-suited to AT-rich DNA such as malaria and slime-mould genomes. We present a new repeat-masking method, tantan, which is motivated by the mechanisms that create simple repeats. This method thoroughly eliminates spurious homology predictions for DNA–DNA, protein–protein and DNA–protein comparisons. Moreover, it enables accurate homology search for non-coding DNA with extreme A + T composition.  相似文献   

4.
A new method for homology search of DNA sequences is suggested. This method may be used to find extensive and not strong homologies with point mutations and deletions. The running program time for comparing sequences is less then the dynamic program algorithms at least at two orders of magnitude. It makes possible to use the method for homology searching throughover the nucleotide bank by personal computers.  相似文献   

5.
Expressed sequence tags (ESTs) are partial cDNA sequences read from both ends of random expressed gene fragments used for discovering new genes. DNA libraries from four different developmental stages of Schistosoma mansoni used in this study generated 141 ESTs representing about 2.5% of S. mansoni sequences in dbEST. Sequencing was done by the dideoxy chain termination method. The sequences were submitted to GenBank for homology searching in nonredundant databases using Basic Local Alignment Search Tool for DNA (BLASTN) alignment and for protein (BLASTX) alignment at the National Center for Biotechnology Information (NCBI). Among submitted ESTs, 29 were derived from lambdagt11 sporocyst library, 70 from lambdaZap adult worm library, 31 from lambdaZap cercarial library, and 11 from lambdaZap female B worm library. Homology search revealed that eight (5.6%) ESTs shared homology to previously identified S.mansoni genes in dbEST, 15 (10.6%) are homologous to known genes in other organisms, 116 (81.7%) showed no significant sequence homology in the databases, and the remaining sequences (2.1%) showed low homologies to rRNA or mitochondrial DNA sequences. Thus, among the 141 ESTs studied, 116 sequences are derived from noval, uncharactarized S. mansoni genes. Those 116 ESTs are important for identification of coding regions in the sequences, helping in mapping of schistosome genome, and identifying genes of immunological and pharmacological significance.  相似文献   

6.
B Bornet  C Muller  F Paulus  M Branchard 《Génome》2002,45(5):890-896
Inter simple sequence repeat (ISSR) sequences as molecular markers can lead to the detection of polymorphism and also be a new approach to the study of SSR distribution and frequency. In this study, ISSR amplification with nonanchored primer was performed in closely related cauliflower lines. Fourty-four different amplified fragments were sequenced. Sequences of PCR products are delimited by the expected motifs and number of repeats, which validates the ISSR nonanchored primer amplification technique. DNA and amino acids homology search between internal sequences and databases (i) show that the majority of the internal regions of ISSR had homologies with known sequences, mainly with genes coding for proteins implicated in DNA interaction or gene expression, which reflected the significance of amplified ISSR sequences and (ii) display long and numerous homologies with the Arabidopsis thaliana genome. ISSR amplifications revealed a high conservation of these sequences between Arabidopsis thaliana and Brassica oleracea var. botrytis. Thirty-four of the 44 ISSRs had one or several perfect or imperfect internal microsatellites. Such distribution indicates the presence in genomes of highly concentrated regions of SSR, or "SSR hot spots." Among the four nonanchored primers used in this study, trinucleotide repeats, and especially (CAA)5, were the most powerful primers for ISSR amplifications regarding the number of amplified bands, level of polymorphism, and their nature.  相似文献   

7.
8.
Two-dimensional graphic analysis of DNA sequence homologies.   总被引:9,自引:3,他引:6       下载免费PDF全文
We describe a computer program designed to facilitate the pattern matching analysis of homologies between DNA sequences. It takes advantage of a two-dimensional plot in order to simplify the evaluation of significant structures inherited in the sequences. The program can be divided into three parts, i) algorithm for search of homologies, ii) two-dimensional graphic display of the result, iii) further graphic treatment to enhance significant structures. The power of the graphic display is presented by the following application of the program. We conducted a search for direct repeats in the mouse immunoglobulin kappa-chain genes. Both the five J DNA sequences and other shorter repeats were found. We also found a longer stretch of homology that could indicate the presence of duplicated DNA in the J4, J5 region.  相似文献   

9.
Nucleic Acid Homologies Among Species of Saccharomyces   总被引:19,自引:4,他引:15       下载免费PDF全文
Evolutionary divergence among species of the yeast genus Saccharomyces was estimated from measurements of deoxyribonucleic acid (DNA)/DNA and ribosomal ribonucleic acid (RNA)/DNA homology. Much diversity was found in the DNA base sequences with several species showing little or no homology to the three reference species, S. cerevisiae, S. lactis, and S. fragilis. These three reference species also showed little or no homology to each other. On the other hand the diversity among ribosomal RNA base sequences was small since most species showed a high degree of homology to the reference species. The arrangement of species based on ribosomal RNA homologies agrees in most cases with current taxonomic groupings. A yeast hybrid (S. fragilis x S. lactis) was shown to contain two nonhomologous genomes. A minimum genome size of 9.2 x 10(9) daltons for S. cerevisiae was calculated from the rate of DNA renaturation.  相似文献   

10.
Meiotic silencing by unpaired DNA (MSUD) is a process that detects unpaired regions between homologous chromosomes and silences them for the duration of sexual development. While the phenomenon of MSUD is well recognized, the process that detects unpaired DNA is poorly understood. In this report, we provide two lines of evidence linking unpaired DNA detection to a physical search for DNA homology. First, we have found that a putative SNF2-family protein (SAD-6) is required for efficient MSUD in Neurospora crassa. SAD-6 is closely related to Rad54, a protein known to facilitate key steps in the repair of double-strand breaks by homologous recombination. Second, we have successfully masked unpaired DNA by placing identical transgenes at slightly different locations on homologous chromosomes. This masking falls apart when the distance between the transgenes is increased. We propose a model where unpaired DNA detection during MSUD is achieved through a spatially constrained search for DNA homology. The identity of SAD-6 as a Rad54 paralog suggests that this process may be similar to the searching mechanism used during homologous recombination.  相似文献   

11.
A wide-ranging examination of plastid (pt)DNA sequence homologies within higher plant nuclear genomes (promiscuous DNA) was undertaken. Digestion with methylation-sensitive restriction enzymes and Southern analysis was used to distinguish plastid and nuclear DNA in order to assess the extent of variability of promiscuous sequences within and between plant species. Some species, such as Gossypium hirsutum (cotton), Nicotiana tabacum (tobacco), and Chenopodium quinoa, showed homogenity of these sequences, while intraspecific sequence variation was observed among different cultivars of Pisum sativum (pea), Hordeum vulgare (barley), and Triticum aestivum (wheat). Hypervariability of plastid sequence homologies was identified in the nuclear genomes of Spinacea oleracea (spinach) and Beta vulgaris (beet), in which individual plants were shown to possess a unique spectrum of nuclear sequences with ptDNA homology. This hypervariability apparently extended to somatic variation in B. vulgaris. No sequences with ptDNA homology were identified by this method in the nuclear genome of Arabidopsis thaliana.   相似文献   

12.
MOTIVATION: Pair-wise alignment of protein sequences and local similarity searches produce many false positives because of compositionally biased regions, also called low-complexity regions (LCRs), of amino acid residues. Masking and filtering such regions significantly improves the reliability of homology searches and, consequently, functional predictions. Most of the available algorithms are based on a statistical approach. We wished to investigate the structural properties of LCRs in biological sequences and develop an algorithm for filtering them. RESULTS: We present an algorithm for detecting and masking LCRs in protein sequences to improve the quality of database searches. We developed the algorithm based on the complexity analysis of subsequences delimited by a pair of identical, repeating subsequences. Given a protein sequence, the algorithm first computes the suffix tree of the sequence. It then collects repeating subsequences from the tree. Finally, the algorithm iteratively tests whether each subsequence delimited by a pair of repeating subsequences meets a given criteria. Test results with 1000 proteins from 20 families in Pfam show that the repeating subsequences are a good indicator for the low-complexity regions, and the algorithm based on such structural information strongly compete with others. AVAILABILITY: http://bioinfo.knu.ac.kr/research/CARD/ CONTACT: swshin@bioinfo.knu.ac.kr  相似文献   

13.
Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m?=?105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m?=?106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.  相似文献   

14.
Radioactive RNA with sequences complementary to human DNA satellite III was hybridised in situ to metaphase chromosomes of the chimpanzee (Pan troglodytes), the gorilla (Gorilla gorilla) and the orangutan (Pongo pygmaeus). A quantitative analysis of the radioactivity, and hence of the chromosomal distribution of human DNA satellite III equivalent sequences in the great apes, was undertaken, and the results compared with interspecies chromosome homologies based upon Giemsa banding patterns. In some instances DNA with sequence homology to human satellite III is present on the equivalent (homologous) chromosomes in identical positions in two or more species although quantitative differences are observed. In other cases there appears to be no correspondence between satellite DNA location and chromosome homology determined by banding patterns. These results differ from those found for most transcribed DNA sequences where the same sequence is located on homologous chromosomes in each species.  相似文献   

15.
Regulatory pattern identification in nucleic acid sequences   总被引:7,自引:5,他引:2       下载免费PDF全文
In addition to the sequence homologies and statistical patterns identified among numerous genetic sequences, there are subtler classes of patterns for which most current computer search methods offer very limited utility. This class includes various presumptive eukaryotic regulatory sites. A critique of the often employed consensus and local homology methods suggests the need for new tools. In particular, such new methods should use the positional and structural data now becoming available on exactly what it is that is recognized in the DNA sequence by sequence-specific binding proteins.  相似文献   

16.
PatternHunter: faster and more sensitive homology search   总被引:15,自引:0,他引:15  
MOTIVATION: Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation. RESULTS: We present a new homology search algorithm 'PatternHunter' that uses a novel seed model for increased sensitivity and new hit-processing techniques for significantly increased speed. At Blast levels of sensitivity, PatternHunter is able to find homologies between sequences as large as human chromosomes, in mere hours on a desktop. AVAILABILITY: PatternHunter is available at http://www.bioinformaticssolutions.com, as a commercial package. It runs on all platforms that support Java. PatternHunter technology is being patented; commercial use requires a license from BSI, while non-commercial use will be free.  相似文献   

17.
Summary Sequences homologous to chloroplast (ct)DNA have been found in nuclear DNA in five species of the Chenopodiaceae, extending the earlier observations of promiscuous DNA in Spinacia oleracea (Timmis and Scott 1983). Using the 7.7 kbp spinach ctDNA Pst I fragment as a hybridization probe, several separately located homologies to ctDNA were resolved in the nuclear DNA of Beta vulgaris, Chenopodium quinoa, and Enchylaena tomentosa. In Chenopodium album and Atriplex cinerea the major region of homology was to a nuclear Eco RI fragment (6 kbp) indistinguishable from that in ctDNA. These homologies may therefore involve larger tracts of ctDNA because the same restriction sites are apparently retained in the nucleus. This suggests that in these latter two species there is a contrasting, more homogeneous arrangement of ctDNA transpositions in the nucleus.  相似文献   

18.
Zhou  Hong  Zhou  Michael  Li  Daisy  Manthey  Joseph  Lioutikova  Ekaterina  Wang  Hong  Zeng  Xiao 《BMC genomics》2017,18(9):826-38

Background

The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA.

Results

Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology.

Conclusions

By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.
  相似文献   

19.
The extent of homology between two sequences is generally expressed quantitatively by using a set of rules to assign to aligned residue pairs, numerical values that depend on some measure of the similarity between the pairs. A homology score for the alignment is obtained by summing the numerical value for each pair, and possibly also adding some appropriate penalty for insertions and deletions in the alignment. Whatever the set of rules used in arriving at the best homology score, the fundamental question that remains is whether the score implies that the sequences are in some way related. In the absence of biological criteria, such a question is generally given a preliminary answer by a Monte Carlo evaluation of the statistical significance of the score. By randomizing the two sequences a large number of times and noting the homology on each throw, the probability of a given score can be obtained for each sequence, and hence the likelihood that the observed homology could have occurred by chance for sequences with the given composition, is easily assessed. In addition to this statistical assessment, however, there is a second statistical question that must also be evaluated—particularly for short homologous stretches (<15 residues). When homology searches are performed against databases containing several hundred thousand residues, an important number to know is the probability that the homologous sequence would have occurred simply as the result of the large number of arrangements of short sequences that must be present in any large collection of disparate sequences. In this note we evaluate different versions of this question using different sets of rules. Our results indicate that homologies of roughly six or more residues out of ten are statistically significant and that the best procedure for finding homologies and determining their significance is with the use of the Dayhoff mutation matrix, assigning a weight of at least —8 as a penalty for gaps.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号