首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
DNA microarray technology, originally developed to measure the level of gene expression, has become one of the most widely used tools in genomic study. The crux of microarray design lies in how to select a unique probe that distinguishes a given genomic sequence from other sequences. Due to its significance, probe selection attracts a lot of attention. Various probe selection algorithms have been developed in recent years. Good probe selection algorithms should produce a small number of candidate probes. Efficiency is also crucial because the data involved are usually huge. Most existing algorithms are usually not sufficiently selective and quite a large number of probes are returned. We propose a new direction to tackle the problem and give an efficient algorithm based on randomization to select a small set of probes and demonstrate that such a small set of probes is sufficient to distinguish each sequence from all the other sequences. Based on the algorithm, we have developed probe selection software RandPS, which runs efficiently in practice. The software is available on our website (http://www.csc.liv.ac.uk/ approximately cindy/RandPS/RandPS.htm). We test our algorithm via experiments on different genomes (Escherichia coli, Saccharamyces cerevisiae, etc.) and our algorithm is able to output unique probes for most of the genes efficiently. The other genes can be identified by a combination of at most two probes.  相似文献   

3.
4.
MOTIVATION: Expressed sequence tag (EST) databases have grown exponentially in recent years and now represent the largest collection of genetic sequences. An important application of these databases is that they contain information useful for the design of gene-specific oligonucleotides (or simply, oligos) that can be used in PCR primer design, microarray experiments and genomic library screening. RESULTS: In this paper, we study two complementary problems concerning the selection of short oligos, e.g. 20-50 bases, from a large database of tens of thousands of ESTs: (i) selection of oligos each of which appears (exactly) in one unigene but does not appear (exactly or approximately) in any other unigene and (ii) selection of oligos that appear (exactly or approximately) in many unigenes. The first problem is called the unique oligo problem and has applications in PCR primer and microarray probe designs, and library screening for gene-rich clones. The second is called the popular oligo problem and is also useful in screening genomic libraries. We present an efficient algorithm to identify all unique oligos in the unigenes and an efficient heuristic algorithm to enumerate the most popular oligos. By taking into account the distribution of the frequencies of the words in the unigene database, the algorithms have been engineered carefully to achieve remarkable running times on regular PCs. Each of the algorithms takes only a couple of hours (on a 1.2 GHz CPU, 1 GB RAM machine) to run on a dataset 28 Mb of barley unigenes from the HarvEST database. We present simulation results on the synthetic data and a preliminary analysis of the barley unigene database. AVAILABILITY: Available on request from the authors.  相似文献   

5.
Modern cultivated barley is an important cereal crop with an estimated genome size of 5000 Mb. To develop the resources for positional cloning and structural genomic analyses in barley, we constructed a bacterial artificial chromosome (BAC) library for the cultivar Morex using the cloning enzyme HindIII. The library contains 313344 clones (816 384-well plates). A random sampling of 504 clones indicated an average insert size of 106 kbp (range=30–195 kbp) and 3.4% empty vectors. Screening the colony filters for chloroplast DNA content indicated an exceptionally low 1.5% contamination with chloroplast DNA. Thus, the library provides 6.3 haploid genome equivalents allowing a >99% probability of recovering any specific sequence of interest. High-density filters were gridded robotically using a Genetix Q-BOT in a 4×4 double-spotted array on 22.5-cm2 filters. Each set of 17 filters allows the entire library to be screened with 18432 clones represented per filter. Screening the library with 40 single copy probes identified an average 6.4 clones per probe, with a range of 1–13 clones per probe. A set of resistance-gene analog (RGA) sequences identified 121 RGA-containing BAC clones representing 20 different regions of the genome with an average of 6.1 clones per locus. Additional screening of the library with a P-loop disease resistance primer probe identified 459 positive BAC clones. These data indicate that this library is a valuable resource for structural genomic applications in barley. Received: 20 September 1999 / Accepted: 25 March 2000  相似文献   

6.
7.
8.
9.
Previous studies have shown that the identification and analysis of both abundant and rare k-mers or “DNA words of length k” in genomic sequences using suitable statistical background models can reveal biologically significant sequence elements. Other studies have investigated the uni/multimodal distribution of k-mer abundances or “k-mer spectra” in different DNA sequences. However, the existing background models are affected to varying extents by compositional bias. Moreover, the distribution of k-mer abundances in the context of related genomes has not been studied previously. Here, we present a novel statistical background model for calculating k-mer enrichment in DNA sequences based on the average of the frequencies of the two (k-1) mers for each k-mer. Comparison of our null model with the commonly used ones, including Markov models of different orders and the single mismatch model, shows that our method is more robust to compositional AT-rich bias and detects many additional, repeat-poor over-abundant k-mers that are biologically meaningful. Analysis of overrepresented genomic k-mers (4≤k≤16) from four yeast species using this model showed that the fraction of overrepresented DNA words falls linearly as k increases; however, a significant number of overabundant k-mers exists at higher values of k. Finally, comparative analysis of k-mer abundance scores across four yeast species revealed a mixture of unimodal and multimodal spectra for the various genomic sub-regions analyzed.  相似文献   

10.
Cloned human apo-C-II cDNA was used as a hybridization probe to identify the human apo-C-II gene in a genomic library constructed in our laboratory. The isolated apo-C-II DNA was studied both by electron microscopy and by direct sequence analysis. Ultrastructural morphological analysis of RNA-DNA hybrids revealed that the apo-C-II gene had complex structures because of regions of inverted complementary sequences in and around the gene forming stem-and-loop structures which interfere with the formation of stable RNA:DNA hybrids. Extensive morphological analysis revealed a minimum of 3 intervening sequences (IVS), and their lengths were measured. Direct sequence analysis of the cloned gene confirmed the presence of 3 IVS. There are 4 Alu type sequences in IVS-I. We sequenced 4340 nucleotides which include 545 nucleotides in the 5' flanking region, the entire gene which spans 3320 nucleotides, and 475 nucleotides in the 3' flanking region which also encompasses an additional Alu sequence. The 5' end of the gene was identified by primer extension and sequencing of the primer extended cDNA. Apo-C-II mRNA structure was deduced from the cDNA sequence, the primer extension experiments, and the genomic sequence. It is 494 nucleotides in length. Its sequence differs from previously published sequences in that there are 7 additional nucleotides before the polyadenylate tail. In the 5' flanking region, nucleotides -234 to -213 encompass a GC-rich region which exhibits high homology (greater than 70%) to the 5' flanking regions of the genes of all the apolipoproteins published to date, namely, apo-A-II (-497 to -471), apo-A-I (approximately -196 to -179), apo-E (-409 to -391), and apo-C-III (approximately -116 to -103). This highly conserved region might represent some evolutionarily conserved sequences from these related genes and/or might represent a region with regulatory function.  相似文献   

11.
12.
Octamer sequencing technology (OST) is a primer-directed sequencing strategy in which an individual octamer primer is selected from a pre-synthesized octamer primer library and used to sequence a DNA fragment. However, selecting candidate primers from such a library is time consuming and can be a bottleneck in the sequencing process. To accelerate the sequencing process and to obtain high quality sequencing data, a computer program, electronic OST or eOST, was developed to automatically identify candidate primers from an octamer primer library. eOST integrates the base calling software PHRED to provide a quality assessment for target sequences and identifies potential primer binding sites located within a high quality target region. To increase the sequencing success rate, eOST includes a simple dynamic folding algorithm to automatically calculate the free energy and predict the secondary structure within the template in the vicinity of the octamer-binding site. Several parameters were found to be important, including base quality threshold, the window size of the template sequence segment, and the threshold ΔG value. OST, coupled with the eOST software, can be used to sequence short DNA fragments or in the finishing assembly stage of large-scale sequencing of genomic DNA.  相似文献   

13.
A method is described for quickly and reproducibly isolating genomic DNA contiguous with known DNA sequence by means of the polymerase chain reaction (PCR). Flanking genomic DNA is isolated using a biotinylated sequence-specific primer in combination with a generic hybrid primer that binds to a deoxyoligonucleotide sequence artificially added to the ends of the genomic DNA. Amplified sequences that include the biotinylated primer are purified from nonbiotinylated amplification products by binding to a solid-phase streptavidin matrix. The biotinylated amplification product(s) are subjected to a further round of amplification, after which they can be subcloned and analyzed. This technique was applied to the isolation of three intron-exon junctions. Verification of the identify of these junction sequences was accomplished by designing primers based on the intron sequences isolated by Biotin-RAGE, amplifying across the exon using these intron primers, and sequencing the PCR-generated product.  相似文献   

14.
15.
Sequence alignment by cross-correlation.   总被引:1,自引:0,他引:1  
Many recent advances in biology and medicine have resulted from DNA sequence alignment algorithms and technology. Traditional approaches for the matching of DNA sequences are based either on global alignment schemes or heuristic schemes that seek to approximate global alignment algorithms while providing higher computational efficiency. This report describes an approach using the mathematical operation of cross-correlation to compare sequences. It can be implemented using the fast fourier transform for computational efficiency. The algorithm is summarized and sample applications are given. These include gene sequence alignment in long stretches of genomic DNA, finding sequence similarity in distantly related organisms, demonstrating sequence similarity in the presence of massive (approximately 90%) random point mutations, comparing sequences related by internal rearrangements (tandem repeats) within a gene, and investigating fusion proteins. Application to RNA and protein sequence alignment is also discussed. The method is efficient, sensitive, and robust, being able to find sequence similarities where other alignment algorithms may perform poorly.  相似文献   

16.
17.
Traditionally primers for PCR detection of viruses have been selected from genomic sequence of single or representative viral strain. However, high mutation rate of viral genomes often results in failure in detecting viruses in clinical and environmental samples. Thus, it seems necessary to consider primers designed from multiple viral sequences in order to improve detection of viral variants. Matchup is a program intended to select universal primers from multiple sequences. We designed using Matchup program primer pairs for HBV detection from 691 full genomic HBV DNA sequences available from NCBI GenBank database. Thousands of primer candidates were initially extracted and these were sequentially filtered down to 5 primer pairs. These primer pairs were tested by PCR using 5 HBV Korean HBsAg(+) patient sera, and eventually one universal primer pair was selected and named MUW (multiple-universal-worldwide). This primer pair, 3 HBV reference primer pairs reported by others and 1 commercial primer pair were compared using 86 HBV HBsAg(+) sera from Korean and Vietnamese patients. The detection rate for MUW primer pair was 72.1%, much greater than those obtained by reference and commercial primers (32.5 to 40.7%). The superiority of MUW primer pair appeared to be correlated with the conserved sequences of the forward primer binding sites and primer quality score. These results suggest that the universal primers designed by the Matchup program from multiple sequences could be useful in detecting viruses from clinical samples.  相似文献   

18.
Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ∼10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors’ websites: e.g. http://www.cs.toronto.edu/∼wkc/kmerHMM.  相似文献   

19.
We developed a completely homogeneous and isothermal method of detecting RNA sequences and demonstrated ultrarapid and direct quantification of pathogenic gene expression with high sensitivity. The assay is based on performing isothermal RNA sequence amplification in the presence of our novel DNA probe, an intercalation activating fluorescence DNA probe, and measuring the fluorescence intensity of the reaction mixture. When detecting mecA gene expression of methicillin-resistant Staphylococcus aureus, we quantified starting copies ranging from 10 to 10(7) copies within 10min. The primer sequences were designed to bind to secondary structure-free sites of the target RNA, which enabled a totally isothermal protocol to quantify mRNA specifically in a sample of existing genomic DNA. When we applied this to quantifying the expression of marker genes of Vibrio parahaemolyticus and Mycobacterium bovis BCG strain, the results correlated well with the viability of each bacterium. We also demonstrated monitoring Pab gene expression of M. bovis BCG during cultivation with antibiotics. The present method can potentially realize rapid antimicrobial susceptibility testing of slowly growing organisms, such as tuberculosis.  相似文献   

20.
We recently developed novel algorithms for exhaustive identification of all nucleotide subsequences present in a pathogen genome which differ by at least a chosen number of mismatches from the sequences of host/background organisms. This type of exhaustive computational analysis will be useful in reducing false positives and cross-reactivity in PCR and hybridization assays. We present the first experimental test of the method by showing that the subsequences identified when used as 18-mer PCR primers can detect the presence of dengue virus (DENV) even in the presence of a large excess of complex human genomic DNA. From our computations, 715 serotype-specific primer pairs were identified for three different DENV serotypes in which each primer sequence lies at least two mismatches from the nearest human sequence. DNA clones of representative strains of DENV-1, DENV-2, and DENV-4 viruses were subjected to real-time PCR testing using eight primer pairs each. Efficiencies were uniformly very high (mean+/-S.D.=99.6+/-3%), and amplification of human DNA was never observed within 35 cycles, even at a 5.5-fold molar excess of human DNA. Exhaustive primer/probe screening can potentially produce more selective and sensitive diagnostic assays for pathogens, especially in the presence of complex backgrounds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号