共查询到20条相似文献,搜索用时 15 毫秒
1.
Quantile smoothing of array CGH data 总被引:4,自引:0,他引:4
MOTIVATION: Plots of array Comparative Genomic Hybridization (CGH) data often show special patterns: stretches of constant level (copy number) with sharp jumps between them. There can also be much noise. Classic smoothing algorithms do not work well, because they introduce too much rounding. To remedy this, we introduce a fast and effective smoothing algorithm based on penalized quantile regression. It can compute arbitrary quantile curves, but we concentrate on the median to show the trend and the lower and upper quartile curves showing the spread of the data. Two-fold cross-validation is used for optimizing the weight of the penalties. RESULTS: Simulated data and a published dataset are used to show the capabilities of the method to detect the segments of changed copy numbers in array CGH data. 相似文献
2.
CNVDetector is a program for locating copy number variations (CNVs) in a single genome. CNVDetector has several merits: (i) it can deal with the array comparative genomic hybridization data even if the noise is not normally distributed; (ii) it has a linear time kernel; (iii) its parameters can be easily selected; (iv) it evaluates the statistical significance for each CNV calling. AVAILABILITY: CNVDetector (for Windows platform) can be downloaded from http:www.csie.ntu.edu.tw/~kmchao/tools/CNVDetector/. The manual of CNVDetector is also available. 相似文献
3.
Array comparative genomic hybridization (aCGH) is a laboratory technique to measure chromosomal copy number changes. A clear biological interpretation of the measurements is obtained by mapping these onto an ordinal scale with categories loss/normal/gain of a copy. The pattern of gains and losses harbors a level of tumor specificity. Here, we present WECCA (weighted clustering of called aCGH data), a method for weighted clustering of samples on the basis of the ordinal aCGH data. Two similarities to be used in the clustering and particularly suited for ordinal data are proposed, which are generalized to deal with weighted observations. In addition, a new form of linkage, especially suited for ordinal data, is introduced. In a simulation study, we show that the proposed cluster method is competitive to clustering using the continuous data. We illustrate WECCA using an application to a breast cancer data set, where WECCA finds a clustering that relates better with survival than the original one. 相似文献
4.
MOTIVATION: The identification of DNA copy number changes provides insights that may advance our understanding of initiation and progression of cancer. Array-based comparative genomic hybridization (array-CGH) has emerged as a technique allowing high-throughput genome-wide scanning for chromosomal aberrations. A number of statistical methods have been proposed for the analysis of array-CGH data. In this article, we consider a fused quantile regression model based on three motivations: (1) quantile regression may provide a more comprehensive picture for the ratio profile of copy numbers than the standard mean regression approach; (2) for simplicity, most available methods assume uniform spacing between neighboring clones, while incorporating the information of physical locations of clones may be helpful and (3) most current methods have a set of tuning parameters that must be carefully tuned, which introduces complexity to the implementation. RESULTS: We formulate the detection of regions of gains and losses in a fused regularized quantile regression framework, incorporating physical locations of clones. We derive an efficient algorithm that computes the entire solution path for the resulting optimization problem, and we propose a simple estimate for the complexity of the fitted model, which leads to convenient selection of the tuning parameter. Three published array-CGH datasets are used to demonstrate our approach. AVAILABILITY: R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/cgh/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
5.
Background
Microarray-CGH experiments are used to detect and map chromosomal imbalances, by hybridizing targets of genomic DNA from a test and a reference sample to sequences immobilized on a slide. These probes are genomic DNA sequences (BACs) that are mapped on the genome. The signal has a spatial coherence that can be handled by specific statistical tools. Segmentation methods seem to be a natural framework for this purpose. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose BACs share the same relative copy number on average. We model a CGH profile by a random Gaussian process whose distribution parameters are affected by abrupt changes at unknown coordinates. Two major problems arise : to determine which parameters are affected by the abrupt changes (the mean and the variance, or the mean only), and the selection of the number of segments in the profile. 相似文献6.
van de Wiel MA Kim KI Vosse SJ van Wieringen WN Wilting SM Ylstra B 《Bioinformatics (Oxford, England)》2007,23(7):892-894
CGHcall achieves high calling accuracy for array CGH data by effective use of breakpoint information from segmentation and by inclusion of several biological concepts that are ignored by existing algorithms. The algorithm is validated for simulated and verified real array CGH data. By incorporating more than three classes, CGHcall improves detection of single copy gains and amplifications. Moreover, it allows effective inclusion of chromosome arm information. AVAILABILITY: An R-package (GUI), a manual and an example data set are available at http://www.few.vu.nl/~mavdwiel/CGHcall.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
7.
Recent advances in array comparative genomic hybridization (array CGH) technology are revolutionizing our understanding of tumor genomes. Marker-based arrays enable rapid survey at megabase intervals, while tiling path arrays examine the entire genome in unprecedented detail. Tumor biopsies are typically small and contain infiltrating stromal cells, requiring tedious microdissection. Tissue heterogeneity is a major barrier to high-throughput profiling of tumor genomes and is also an important consideration for the introduction of array CGH to clinical settings. We propose that increasing array resolution will enhance detection sensitivity in mixed tissues and as a result significantly reduce microdissection requirements. In this study, we first simulated normal cell contamination to determine the heterogeneity tolerance of array CGH and then validated this detection sensitivity model on cancer specimens using the newly developed submegabase resolution tiling-set (SMRT) array, which spans the human genome with 32,433 overlapping BAC clones. 相似文献
8.
Robust smooth segmentation approach for array CGH data analysis 总被引:2,自引:0,他引:2
Huang J Gusnanto A O'Sullivan K Staaf J Borg A Pawitan Y 《Bioinformatics (Oxford, England)》2007,23(18):2463-2469
MOTIVATION: Array comparative genomic hybridization (aCGH) provides a genome-wide technique to screen for copy number alteration. The existing segmentation approaches for analyzing aCGH data are based on modeling data as a series of discrete segments with unknown boundaries and unknown heights. Although the biological process of copy number alteration is discrete, in reality a variety of biological and experimental factors can cause the signal to deviate from a stepwise function. To take this into account, we propose a smooth segmentation (smoothseg) approach. METHODS: To achieve a robust segmentation, we use a doubly heavy-tailed random-effect model. The first heavy-tailed structure on the errors deals with outliers in the observations, and the second deals with possible jumps in the underlying pattern associated with different segments. We develop a fast and reliable computational procedure based on the iterative weighted least-squares algorithm with band-limited matrix inversion. RESULTS: Using simulated and real data sets, we demonstrate how smoothseg can aid in identification of regions with genomic alteration and in classification of samples. For the real data sets, smoothseg leads to smaller false discovery rate and classification error rate than the circular binary segmentation (CBS) algorithm. In a realistic simulation setting, smoothseg is better than wavelet smoothing and CBS in identification of regions with genomic alterations and better than CBS in classification of samples. For comparative analyses, we demonstrate that segmenting the t-statistics performs better than segmenting the data. AVAILABILITY: The R package smoothseg to perform smooth segmentation is available from http://www.meb.ki.se/~yudpaw. 相似文献
9.
Background
In two-channel competitive genomic hybridization microarray experiments, the ratio of the two fluorescent signal intensities at each spot on the microarray is commonly used to infer the relative amounts of the test and reference sample DNA levels. This ratio may be influenced by systematic measurement effects from non-biological sources that can introduce biases in the estimated ratios. These biases should be removed before drawing conclusions about the relative levels of DNA. The performance of existing gene expression microarray normalization strategies has not been evaluated for removing systematic biases encountered in array-based comparative genomic hybridization (CGH), which aims to detect single copy gains and losses typically in samples with heterogeneous cell populations resulting in only slight shifts in signal ratios. The purpose of this work is to establish a framework for correcting the systematic sources of variation in high density CGH array images, while maintaining the true biological variations. 相似文献10.
MOTIVATION: In recent years, a range of techniques for analysis and segmentation of array comparative genomic hybridization (aCGH) data have been proposed. For array designs in which clones are of unequal lengths, are unevenly spaced or overlap, the discrete-index view typically adopted by such methods may be questionable or improved. RESULTS: We describe a continuous-index hidden Markov model for aCGH data as well as a Monte Carlo EM algorithm to estimate its parameters. It is shown that for a dataset from the BT-474 cell line analysed on 32K BAC tiling microarrays, this model yields considerably better model fit in terms of lag-1 residual autocorrelations compared to a discrete-index HMM, and it is also shown how to use the model for e.g. estimation of change points on the base-pair scale and for estimation of conditional state probabilities across the genome. In addition, the model is applied to the Glioblastoma Multiforme data used in the comparative study by Lai et al. (Lai,W.R. et al. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics, 21, 3763-3370.) giving result similar to theirs but with certain features highlighted in the continuous-index setting. 相似文献
11.
Modeling recurrent DNA copy number alterations in array CGH data 总被引:1,自引:0,他引:1
MOTIVATION: Recurrent DNA copy number alterations (CNA) measured with array comparative genomic hybridization (aCGH) reveal important molecular features of human genetics and disease. Studying aCGH profiles from a phenotypic group of individuals can determine important recurrent CNA patterns that suggest a strong correlation to the phenotype. Computational approaches to detecting recurrent CNAs from a set of aCGH experiments have typically relied on discretizing the noisy log ratios and subsequently inferring patterns. We demonstrate that this can have the effect of filtering out important signals present in the raw data. In this article we develop statistical models that jointly infer CNA patterns and the discrete labels by borrowing statistical strength across samples. RESULTS: We propose extending single sample aCGH HMMs to the multiple sample case in order to infer shared CNAs. We model recurrent CNAs as a profile encoded by a master sequence of states that generates the samples. We show how to improve on two basic models by performing joint inference of the discrete labels and providing sparsity in the output. We demonstrate on synthetic ground truth data and real data from lung cancer cell lines how these two important features of our model improve results over baseline models. We include standard quantitative metrics and a qualitative assessment on which to base our conclusions. AVAILABILITY: http://www.cs.ubc.ca/~sshah/acgh. 相似文献
12.
Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data 下载免费PDF全文
Klijn C Holstege H de Ridder J Liu X Reinders M Jonkers J Wessels L 《Nucleic acids research》2008,36(2):e13
Tumor formation is in part driven by DNA copy number alterations (CNAs), which can be measured using microarray-based Comparative Genomic Hybridization (aCGH). Multiexperiment analysis of aCGH data from tumors allows discovery of recurrent CNAs that are potentially causal to cancer development. Until now, multiexperiment aCGH data analysis has been dependent on discretization of measurement data to a gain, loss or no-change state. Valuable biological information is lost when a heterogeneous system such as a solid tumor is reduced to these states. We have developed a new approach which inputs nondiscretized aCGH data to identify regions that are significantly aberrant across an entire tumor set. Our method is based on kernel regression and accounts for the strength of a probe's signal, its local genomic environment and the signal distribution across multiple tumors. In an analysis of 89 human breast tumors, our method showed enrichment for known cancer genes in the detected regions and identified aberrations that are strongly associated with breast cancer subtypes and clinical parameters. Furthermore, we identified 18 recurrent aberrant regions in a new dataset of 19 p53-deficient mouse mammary tumors. These regions, combined with gene expression microarray data, point to known cancer genes and novel candidate cancer genes. 相似文献
13.
14.
A method for calling gains and losses in array CGH data 总被引:11,自引:0,他引:11
Array CGH is a powerful technique for genomic studies of cancer. It enables one to carry out genome-wide screening for regions of genetic alterations, such as chromosome gains and losses, or localized amplifications and deletions. In this paper, we propose a new algorithm 'Cluster along chromosomes' (CLAC) for the analysis of array CGH data. CLAC builds hierarchical clustering-style trees along each chromosome arm (or chromosome), and then selects the 'interesting' clusters by controlling the False Discovery Rate (FDR) at a certain level. In addition, it provides a consensus summary across a set of arrays, as well as an estimate of the corresponding FDR. We illustrate the method using an application of CLAC on a lung cancer microarray CGH data set as well as a BAC array CGH data set of aneuploid cell strains. 相似文献
15.
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing. 相似文献
16.
MOTIVATION: Chromosomal copy number changes (aneuploidies) are common in cell populations that undergo multiple cell divisions including yeast strains, cell lines and tumor cells. Identification of aneuploidies is critical in evolutionary studies, where changes in copy number serve an adaptive purpose, as well as in cancer studies, where amplifications and deletions of chromosomal regions have been identified as a major pathogenetic mechanism. Aneuploidies can be studied on whole-genome level using array CGH (a microarray-based method that measures the DNA content), but their presence also affects gene expression. In gene expression microarray analysis, identification of copy number changes is especially important in preventing aberrant biological conclusions based on spurious gene expression correlation or masked phenotypes that arise due to aneuploidies. Previously suggested approaches for aneuploidy detection from microarray data mostly focus on array CGH, address only whole-chromosome or whole-arm copy number changes, and rely on thresholds or other heuristics, making them unsuitable for fully automated general application to gene expression datasets. There is a need for a general and robust method for identification of aneuploidies of any size from both array CGH and gene expression microarray data. RESULTS: We present ChARM (Chromosomal Aberration Region Miner), a robust and accurate expectation-maximization based method for identification of segmental aneuploidies (partial chromosome changes) from gene expression and array CGH microarray data. Systematic evaluation of the algorithm on synthetic and biological data shows that the method is robust to noise, aneuploidal segment size and P-value cutoff. Using our approach, we identify known chromosomal changes and predict novel potential segmental aneuploidies in commonly used yeast deletion strains and in breast cancer. ChARM can be routinely used to identify aneuploidies in array CGH datasets and to screen gene expression data for aneuploidies or array biases. Our methodology is sensitive enough to detect statistically significant and biologically relevant aneuploidies even when expression or DNA content changes are subtle as in mixed populations of cells. AVAILABILITY: Code available by request from the authors and on Web supplement at http://function.cs.princeton.edu/ChARM/ 相似文献
17.
18.
A new look towards BAC-based array CGH through a comprehensive comparison with oligo-based array CGH
Nicolas Wicker Annaïck Carles Ian G Mills Maija Wolf Abhi Veerakumarasivam Henrik Edgren Fabrice Boileau Bohdan Wasylyk Jack A Schalken David E Neal Olli Kallioniemi Olivier Poch 《BMC genomics》2007,8(1):1-10
Background
Natural populations of the teleost fish Fundulus heteroclitus tolerate a broad range of environmental conditions including temperature, salinity, hypoxia and chemical pollutants. Strikingly, populations of Fundulus inhabit and have adapted to highly polluted Superfund sites that are contaminated with persistent toxic chemicals. These natural populations provide a foundation to discover critical gene pathways that have evolved in a complex natural environment in response to environmental stressors.Results
We used Fundulus cDNA arrays to compare metabolic gene expression patterns in the brains of individuals among nine populations: three independent, polluted Superfund populations and two genetically similar, reference populations for each Superfund population. We found that up to 17% of metabolic genes have evolved adaptive changes in gene expression in these Superfund populations. Among these genes, two (1.2%) show a conserved response among three polluted populations, suggesting common, independently evolved mechanisms for adaptation to environmental pollution in these natural populations.Conclusion
Significant differences among individuals between polluted and reference populations, statistical analyses indicating shared adaptive changes among the Superfund populations, and lack of reduction in gene expression variation suggest that common mechanisms of adaptive resistance to anthropogenic pollutants have evolved independently in multiple Fundulus populations. Among three independent, Superfund populations, two genes have a common response indicating that high selective pressures may favor specific responses. 相似文献19.
SUMMARY: We have developed a new method (BioHMM) for segmenting array comparative genomic hybridization data into states with the same underlying copy number. By utilizing a heterogeneous hidden Markov model, BioHMM incorporates relevant biological factors (e.g. the distance between adjacent clones) in the segmentation process. 相似文献
20.
MOTIVATION: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. RESULTS: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. AVAILABILITY: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher. 相似文献