首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.

Background  

Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome.  相似文献   

2.
Optimized design and assessment of whole genome tiling arrays   总被引:1,自引:0,他引:1  
MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. RESULTS: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs. AVAILABILITY: Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/.  相似文献   

3.
Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important concerns regarding the suitability of this approach for clinical diagnostic applications. Here, we adapt the Smith–Waterman dynamic-programming algorithm to provide a sensitive and robust analytic approach (SW-ARRAY) for detecting copy-number changes in array CGH data. In a blind series of hybridizations to arrays consisting of the entire tiling path for the terminal 2 Mb of human chromosome 16p, the method identified all monosomies between 267 and 1567 kb with a high degree of statistical significance and accurately located the boundaries of deletions in the range 267–1052 kb. The approach is unique in offering both a nonparametric segmentation procedure and a nonparametric test of significance. It is scalable and well-suited to high resolution whole genome array CGH studies that use array probes derived from large insert clones as well as PCR products and oligonucleotides.  相似文献   

4.
Statistical analysis on tiling array data is extremely challenging due to the astronomically large number of sequence probes, high noise levels of individual probes and limited number of replicates in these data. To overcome these difficulties, we first developed statistical error estimation and weighted ANOVA modeling approaches to high-density tiling array data, especially the former based on an advanced error-pooling method to accurately obtain heterogeneous technical error of small-sample tiling array data. Based on these approaches, we analyzed the high-density tiling array data of the temporal replication patterns during cell-cycle S phase of synchronized HeLa cells on human chromosomes 21 and 22. We found many novel temporal replication patterns, identifying about 26% of over 1 million tiling array sequence probes with significant differential replication during the four 2-h time periods of S phase. Among these differentially replicated probes, 126941 sequence probes were matched to 417 known genes. The majority of these genes were found to be replicated within one or two consecutive time periods, while the others were replicated at two non-consecutive time periods. Also, coding regions found to be more differentially replicated in particular time periods than noncoding regions in the gene-poor chromosome 21 (25% differentially replicated among genic probes versus 18.6% among intergenic probes), while such a phenomenon was less prominent in gene-rich chromosome 22. A rigorous statistical testing for local proximity of differentially replicated genic and intergenic probes was performed to identify significant stretches of differentially replicated sequence regions. From this analysis, we found that adjacent genes were frequently replicated at different time periods, potentially implying the existence of quite dense replication origins. Evaluating the conditional probability significance of identified gene ontology terms on chromosomes 21 and 22, we detected some over-represented molecular functions and biological processes among these differentially replicated genes, such as the ones relevant to hydrolase, transferase and receptor-binding activities. Some of these results were confirmed showing >70% consistency with cDNA microarray data that were independently generated in parallel with the tiling arrays. Thus, our improved analysis approaches specifically designed for high-density tiling array data enabled us to reliably and sensitively identify many novel temporal replication patterns on human chromosomes.  相似文献   

5.
Pseudomonas aeruginosa is an important pathogenic and environmental bacterium, with the most widely studied strain being PAO1. Using the PAO1 reference cosmid library and the recently completed PAO1 genome sequence, we have mapped a minimal tiling path across the genome using a two-step strategy. First, we sequenced both ends of a set of over 500 random and previously mapped clones to create a backbone. Second, we end-sequenced a second set of cosmid clones that were identified to lie within the larger gaps using hybridization of the reference library filters with probes designed against sequences at the center of each gap. The minimal tiling path was calculated using the program Domino (http://www.bit.uq.edu.au/download/), with the overlap between adjacent clones set to 5 kb (where possible) to minimize the chance of truncating genes. This yielded a minimal tiling cosmid library (334 clones) covering 93.7% of the genome in 57 contigs. This library has reduced to a workable set the number of clones required to represent the majority of the P. aeruginosa genome and gives the precise location of each cosmid, enabling most genes of interest to be located on clones without further screening. This library should prove a useful resource to accelerate functional analysis of the P. aeruginosa genome.  相似文献   

6.
7.
We carried out a cross species cattle-sheep array comparative genome hybridization experiment to identify copy number variations (CNVs) in the sheep genome analysing ewes of Italian dairy or dual-purpose breeds (Bagnolese, Comisana, Laticauda, Massese, Sarda, and Valle del Belice) using a tiling oligonucleotide array with ~385,000 probes designed on the bovine genome. We identified 135 CNV regions (CNVRs; 24 reported in more than one animal) covering ~10.5 Mb of the virtual sheep genome referred to the bovine genome (0.398%) with a mean and a median equal to 77.6 and 55.9 kb, respectively. A comparative analysis between the identified sheep CNVRs and those reported in cattle and goat genomes indicated that overlaps between sheep and both other species CNVRs are highly significant (P<0.0001), suggesting that several chromosome regions might contain recurrent interspecies CNVRs. Many sheep CNVRs include genes with important biological functions. Further studies are needed to evaluate their functional relevance.  相似文献   

8.
9.
10.
Keleş S 《Biometrics》2007,63(1):10-21
Chromatin immunoprecipitation followed by DNA microarray analysis (ChIP-chip methodology) is an efficient way of mapping genome-wide protein-DNA interactions. Data from tiling arrays encompass DNA-protein interaction measurements on thousands or millions of short oligonucleotides (probes) tiling a whole chromosome or genome. We propose a new model-based method for analyzing ChIP-chip data. The proposed model is motivated by the widely used two-component multinomial mixture model of de novo motif finding. It utilizes a hierarchical gamma mixture model of binding intensities while incorporating inherent spatial structure of the data. In this model, genomic regions belong to either one of the following two general groups: regions with a local protein-DNA interaction (peak) and regions lacking this interaction. Individual probes within a genomic region are allowed to have different localization rates accommodating different binding affinities. A novel feature of this model is the incorporation of a distribution for the peak size derived from the experimental design and parameters. This leads to the relaxation of the fixed peak size assumption that is commonly employed when computing a test statistic for these types of spatial data. Simulation studies and a real data application demonstrate good operating characteristics of the method including high sensitivity with small sample sizes when compared to available alternative methods.  相似文献   

11.
12.
13.
14.
Herold KE  Rasooly A 《BioTechniques》2003,35(6):1216-1221
Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.  相似文献   

15.
16.
17.
MOTIVATION: The resolution at which genomic alterations can be mapped by means of oligonucleotide aCGH (array-based comparative genomic hybridization) is limited by two factors: the availability of high-quality probes for the target genomic sequence and the array real-estate. Optimization of the probe selection process is required for arrays that are designed to probe specific genomic regions in very high resolution without compromising probe quality constraints. RESULTS: In this paper we describe a well-defined optimization problem associated with the problem of probe selection for high-resolution aCGH arrays. We propose the whenever possible in-cover as a formulation that faithfully captures the requirement of probe selection problem, and provide a fast randomized algorithm that solves the optimization problem in O(n logn) time, as well as a deterministic algorithm with the same asymptotic performance. We apply the method in a typical high-definition array design scenario and demonstrate its superiority with respect to alternative approaches. AVAILABILITY: Address requests to the authors.  相似文献   

18.
19.
20.
DNA microarrays have the ability to analyze the expression of thousands of the same set of genes under at least two different experimental conditions. However, DNA microarrays require substantial amounts of RNA to generate the probes, especially when bacterial RNA is used for hybridization (50 microg of bacterial total RNA contains approximately 2 microg of mRNA). We have developed a computer-based algorithm for prediction of the minimal number of primers to specifically anneal to all genes in a given genome. The algorithm predicts, for example, that 37 oligonucleotides should prime all genes in the Mycobacterium tuberculosis genome. We tested the usefulness of the genome-directed primers (GDPs) in comparison to random primers for gene expression profiling using DNA microarrays. Both types of primers were used to generate fluorescent-labeled probes and to hybridize to an array of 960 mycobacterial genes. Compared to random-primer probes, the GDP probes were more sensitive and more specific, especially when mammalian RNA samples were spiked with mycobacterial RNA. The GDPs were used for gene expression profiling of mycobacterial cultures grown to early log or stationary growth phases. This approach could be useful for accurate genome-wide expression analysis, especially for in vivo gene expression profiling, as well as directed amplification of sequenced genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号