共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Selecting oligonucleotide probes for whole-genome tiling arrays with a cross-hybridization potential
Hafemeister C Krause R Schliep A 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1642-1652
For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization. 相似文献
3.
Utilizing tiling microarrays for whole-genome analysis in plants 总被引:1,自引:0,他引:1
4.
5.
Background
Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. 相似文献6.
Current microarray technology allows researchers to genotype a large number of SNPs with relatively small amounts of DNA. Nevertheless, researchers and clinicians still frequently face the problem of acquiring enough high-quality DNA for analysis. Whole-genome amplification (WGA) methods offer a solution for this problem, and earlier studies have shown that WGA samples perform reasonably well in small-scale genetic analyses (e.g. Affymetrix 10K array). To determine the performance of WGA products on a large-scale genotyping array, we compared the Affymetrix 250K array genotyping results of genomic DNA and their WGA products from four individuals. Our results indicate that WGA product performs well on the 250K array compared to genomic DNA, especially when using the BRLMM calling algorithm. WGA samples have high call rates (97.5% on average, compared to 99.4% for genomic DNA) and excellent concordance rates with their corresponding genomic DNA samples (98.7% on average). In addition, no apparent systematic genomic amplification bias can be detected. This study demonstrates that, although there is a slight decrease in the total call rates, WGA methods provide a reliable approach for increasing the amount of DNA samples for use with a common SNP genotyping array. 相似文献
7.
8.
9.
10.
Statistical analysis on tiling array data is extremely challenging due to the astronomically large number of sequence probes, high noise levels of individual probes and limited number of replicates in these data. To overcome these difficulties, we first developed statistical error estimation and weighted ANOVA modeling approaches to high-density tiling array data, especially the former based on an advanced error-pooling method to accurately obtain heterogeneous technical error of small-sample tiling array data. Based on these approaches, we analyzed the high-density tiling array data of the temporal replication patterns during cell-cycle S phase of synchronized HeLa cells on human chromosomes 21 and 22. We found many novel temporal replication patterns, identifying about 26% of over 1 million tiling array sequence probes with significant differential replication during the four 2-h time periods of S phase. Among these differentially replicated probes, 126941 sequence probes were matched to 417 known genes. The majority of these genes were found to be replicated within one or two consecutive time periods, while the others were replicated at two non-consecutive time periods. Also, coding regions found to be more differentially replicated in particular time periods than noncoding regions in the gene-poor chromosome 21 (25% differentially replicated among genic probes versus 18.6% among intergenic probes), while such a phenomenon was less prominent in gene-rich chromosome 22. A rigorous statistical testing for local proximity of differentially replicated genic and intergenic probes was performed to identify significant stretches of differentially replicated sequence regions. From this analysis, we found that adjacent genes were frequently replicated at different time periods, potentially implying the existence of quite dense replication origins. Evaluating the conditional probability significance of identified gene ontology terms on chromosomes 21 and 22, we detected some over-represented molecular functions and biological processes among these differentially replicated genes, such as the ones relevant to hydrolase, transferase and receptor-binding activities. Some of these results were confirmed showing >70% consistency with cDNA microarray data that were independently generated in parallel with the tiling arrays. Thus, our improved analysis approaches specifically designed for high-density tiling array data enabled us to reliably and sensitively identify many novel temporal replication patterns on human chromosomes. 相似文献
11.
Multiple displacement-based whole-genome DNA amplification is a promising tool to obtain sufficient DNA from small tissue specimens for various genetic analyses, such as SNP array-based analysis. Using Affymetrix 10 K and 100 K SNP mapping array, we evaluated the performance of the Phi29 DNA polymerase-based genome amplification. Greater than 99% concordance in genotyping calls were achieved between amplified and non-amplified DNAs for both arrays. By utilizing the Affymetrix GeneChip Chromosome Copy Number Tool, the allelic imbalance profiles for the advanced stage oral tongue squamous cell carcinoma (OTSCC) were generated based on 10 K and 100 K SNP mapping array results. The results from these two array platforms agree closely, but more precise allelic imbalance patterns can be revealed from the 100 K SNP mapping array data. Furthermore, our data suggested a frequent loss at 3p11–p12 for advanced stage OTSCC. 相似文献
12.
To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a cohort of individuals that are both independently sampled from the original SNPs and individuals used in developing the arrays. Without utilization of an independent test set, previous estimates of genetic coverage and statistical power may be subject to an overfitting bias. Additionally, the SNP arrays' statistical power in WGAS has not been systematically assessed on real traits. One robust setting for doing so is to evaluate statistical power on thousands of traits measured from a single set of individuals. In this study, 359 newly sampled Americans of European descent were genotyped using both Affymetrix 500K (Affx500K) and Illumina 650Y (Ilmn650K) SNP arrays. From these data, we were able to obtain estimates of genetic coverage, which are robust to overfitting, by constructing an independent test set from among these genotypes and individuals. Furthermore, we collected liver tissue RNA from the participants and profiled these samples on a comprehensive gene expression microarray. The RNA levels were used as a large-scale set of quantitative traits to calibrate the relative statistical power of the commercial arrays. Our genetic coverage estimates are lower than previous reports, providing evidence that previous estimates may be inflated due to overfitting. The Ilmn650K platform showed reasonable power (50% or greater) to detect SNPs associated with quantitative traits when the signal-to-noise ratio (SNR) is greater than or equal to 0.5 and the causal SNP's minor allele frequency (MAF) is greater than or equal to 20% (N=359). In testing each of the more than 40,000 gene expression traits for association to each of the SNPs on the Ilmn650K and Affx500K arrays, we found that the Ilmn650K yielded 15% times more discoveries than the Affx500K at the same false discovery rate (FDR) level. 相似文献
13.
14.
Optimized design and assessment of whole genome tiling arrays 总被引:1,自引:0,他引:1
Gräf S Nielsen FG Kurtz S Huynen MA Birney E Stunnenberg H Flicek P 《Bioinformatics (Oxford, England)》2007,23(13):i195-i204
MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. RESULTS: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs. AVAILABILITY: Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/. 相似文献
15.
Fundamentals of DNA hybridization arrays for gene expression analysis 总被引:13,自引:0,他引:13
DNA hybridization arrays [also known as macroarrays, microarrays and/or high-density oligonucleotide arrays (Gene Chips)] bring gene expression analysis to a genomic scale by permitting investigators to simultaneously examine changes in the expression of literally thousands of genes. For hybridization arrays, the general approach is to immobilize gene-specific sequences (probes) on a solid state matrix (nylon membranes, glass microscope slides, silicon/ceramic chips). These sequences are then queried with labeled copies of nucleic acids from biological samples (targets). The underlying theory is that the greater the expression of a gene, the greater the amount of labeled target, and hence, the greater output signal. In spite of the simplicity of the experimental design, there are at least four different platforms and several different approaches to processing and labeling the biological samples. Moreover, investigators must also determine whether they will utilize commercially available arrays or generate their own. This review will cover the status of the hybridization array field with an eye toward underlying principles and available technologies. Future developments and technological trends will also be evaluated. 相似文献
16.
HAN Yujun NI Peixiang Lü Hong YE Jia HU Jianfei CHEN Chen HUANG Xiangang CONG Lijuan LI Guangyuan WANG Jing GU Xiaocheng YU Jun & LI Songgang . College of Life Sciences Peking University Beijing China . Beijing Genomics Institute Chinese Academy of Sciences Beijing China . James D. Watson Institute of Genome Sciences Zhejiang University Hangzhou China 《中国科学:生命科学英文版》2005,48(3):300-306
Parallel to improvements in DNA sequencing and computer technologies, the output of bio-information grows dramatically every year. More and more species with important commercial, medical and biological significance have been or are being sequenced. There are two kinds of whole-genome sequencing strategies: The clone-by-clone shotgun method (hierarchical shotgun) and the whole-genome shotgun (WGS) method, each with its individual strengths and draw-backs. In the clone-by-clone method, the a… 相似文献
17.
HAN Yujun NI Peixiang U Hong YE Jia HU Jianfei CHEN Chen HUANG Xiangang CONG Lijuan Li Guangyuan WANG Jing GU Xiaocheng YU Jun Li Songgang 《中国科学C辑(英文版)》2005,48(3)
Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L.ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of Iow-cost whole-genome microarray. With the increasing number of organisms being sequenced,we believe that DB data will play an important role both in other assembly procedures and infuture genomic studies. 相似文献
18.
19.
20.
Winzeler EA Castillo-Davis CI Oshiro G Liang D Richards DR Zhou Y Hartl DL 《Genetics》2003,163(1):79-89
The availability of a complete genome sequence allows the detailed study of intraspecies variability. Here we use high-density oligonucleotide arrays to discover 11,115 single-feature polymorphisms (SFPs) existing in one or more of 14 different yeast strains. We use these SFPs to define regions of genetic identity between common laboratory strains of yeast. We assess the genome-wide distribution of genetic variation on the basis of this yeast population. We find that genome variability is biased toward the ends of chromosomes and is more likely to be found in genes with roles in fermentation or in transport. This subtelomeric bias may arise through recombination between nonhomologous sequences because full-gene deletions are more common in these regions than in more central regions of the chromosome. 相似文献