首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Array comparative genomic hybridization (CGH) allows detection and mapping of copy number of DNA segments. A challenge is to make inferences about the copy number structure of the genome. Several statistical methods have been proposed to determine genomic segments with different copy number levels. However, to date, no comprehensive comparison of various characteristics of these methods exists. Moreover, the segmentation results have not been utilized in downstream analyses. RESULTS: We describe a comparison of three popular and publicly available methods for the analysis of array CGH data and we demonstrate how segmentation results may be utilized in the downstream analyses such as testing and classification, yielding higher power and prediction accuracy. Since the methods operate on individual chromosomes, we also propose a novel procedure for merging segments across the genome, which results in an interpretable set of copy number levels, and thus facilitate identification of copy number alterations in each genome. AVAILABILITY: http://www.bioconductor.org  相似文献   

2.

Background

Array-based comparative genomic hybridization (array CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci and the reliable detection of local one-copy-level variations. Characterization of these DNA copy number changes is important for both the basic understanding of cancer and its diagnosis. In order to develop effective methods to identify aberration regions from array CGH data, many recent research work focus on both smoothing-based and segmentation-based data processing. In this paper, we propose stationary packet wavelet transform based approach to smooth array CGH data. Our purpose is to remove CGH noise in whole frequency while keeping true signal by using bivariate model.

Results

In both synthetic and real CGH data, Stationary Wavelet Packet Transform (SWPT) is the best wavelet transform to analyze CGH signal in whole frequency. We also introduce a new bivariate shrinkage model which shows the relationship of CGH noisy coefficients of two scales in SWPT. Before smoothing, the symmetric extension is considered as a preprocessing step to save information at the border.

Conclusion

We have designed the SWTP and the SWPT-Bi which are using the stationary wavelet packet transform with the hard thresholding and the new bivariate shrinkage estimator respectively to smooth the array CGH data. We demonstrate the effectiveness of our approach through theoretical and experimental exploration of a set of array CGH data, including both synthetic data and real data. The comparison results show that our method outperforms the previous approaches.
  相似文献   

3.
Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.  相似文献   

4.
Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data.  相似文献   

5.
Array-based comparative genomics hybridization (aCGH) has gained prevalence as an effective technique for measuring structural variations in the genome. Copy-number variations (CNVs) form a large source of genomic structural variation, but it is not known whether phenotypic differences between intra-species groups, such as divergent human populations, or breeds of a domestic animal, can be attributed to CNVs. Several computational methods have been proposed to improve the detection of CNVs from array CGH data, but few population studies have used CGH data for identification of intra-species differences. In this paper we propose a novel method of genome-wide comparison and classification using CGH data that condenses whole genome information, aimed at quantification of intra-species variations and discovery of shared ancestry. Our strategy included smoothing CGH data using an appropriate denoising algorithm, extracting features via wavelets, quantifying the information via wavelet power spectrum and hierarchical clustering of the resultant profile. To evaluate the classification efficiency of our method, we used simulated data sets. We applied it to aCGH data from human and bovine individuals and showed that it successfully detects existing intra-specific variations with additional evolutionary implications.  相似文献   

6.
This paper presents an extension of the joint modeling strategy for the case of multiple longitudinal outcomes and repeated infections of different types over time, motivated by postkidney transplantation data. Our model comprises two parts linked by shared latent terms. On the one hand is a multivariate mixed linear model with random effects, where a low‐rank thin‐plate spline function is incorporated to collect the nonlinear behavior of the different profiles over time. On the other hand is an infection‐specific Cox model, where the dependence between different types of infections and the related times of infection is through a random effect associated with each infection type to catch the within dependence and a shared frailty parameter to capture the dependence between infection types. We implemented the parameterization used in joint models which uses the fitted longitudinal measurements as time‐dependent covariates in a relative risk model. Our proposed model was implemented in OpenBUGS using the MCMC approach.  相似文献   

7.
Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important concerns regarding the suitability of this approach for clinical diagnostic applications. Here, we adapt the Smith–Waterman dynamic-programming algorithm to provide a sensitive and robust analytic approach (SW-ARRAY) for detecting copy-number changes in array CGH data. In a blind series of hybridizations to arrays consisting of the entire tiling path for the terminal 2 Mb of human chromosome 16p, the method identified all monosomies between 267 and 1567 kb with a high degree of statistical significance and accurately located the boundaries of deletions in the range 267–1052 kb. The approach is unique in offering both a nonparametric segmentation procedure and a nonparametric test of significance. It is scalable and well-suited to high resolution whole genome array CGH studies that use array probes derived from large insert clones as well as PCR products and oligonucleotides.  相似文献   

8.
The availability of high resolution array comparative genomic hybridization (CGH) platforms has led to increasing complexities in data analysis. Specifically, defining contiguous regions of alterations or segmentation can be computationally intensive and popular algorithms can take hours to days for the processing of arrays comprised of hundreds of thousands to millions of elements. Additionally, tumors tend to demonstrate subtle copy number alterations due to heterogeneity, ploidy and hybridization effects. Thus, there is a need for fast, sensitive array CGH segmentation and alteration calling algorithms. Here, we describe Fast Algorithm for Calling After Detection of Edges (FACADE), a highly sensitive and easy to use algorithm designed to rapidly segment and call high resolution array data.  相似文献   

9.
Summary In National Toxicology Program (NTP) studies, investigators want to assess whether a test agent is carcinogenic overall and specific to certain tumor types, while estimating the dose‐response profiles. Because there are potentially correlations among the tumors, a joint inference is preferred to separate univariate analyses for each tumor type. In this regard, we propose a random effect logistic model with a matrix of coefficients representing log‐odds ratios for the adjacent dose groups for tumors at different sites. We propose appropriate nonparametric priors for these coefficients to characterize the correlations and to allow borrowing of information across different dose groups and tumor types. Global and local hypotheses can be easily evaluated by summarizing the output of a single Monte Carlo Markov chain (MCMC). Two multiple testing procedures are applied for testing local hypotheses based on the posterior probabilities of local alternatives. Simulation studies are conducted and an NTP tumor data set is analyzed illustrating the proposed approach.  相似文献   

10.
MOTIVATION: In recent years, a range of techniques for analysis and segmentation of array comparative genomic hybridization (aCGH) data have been proposed. For array designs in which clones are of unequal lengths, are unevenly spaced or overlap, the discrete-index view typically adopted by such methods may be questionable or improved. RESULTS: We describe a continuous-index hidden Markov model for aCGH data as well as a Monte Carlo EM algorithm to estimate its parameters. It is shown that for a dataset from the BT-474 cell line analysed on 32K BAC tiling microarrays, this model yields considerably better model fit in terms of lag-1 residual autocorrelations compared to a discrete-index HMM, and it is also shown how to use the model for e.g. estimation of change points on the base-pair scale and for estimation of conditional state probabilities across the genome. In addition, the model is applied to the Glioblastoma Multiforme data used in the comparative study by Lai et al. (Lai,W.R. et al. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics, 21, 3763-3370.) giving result similar to theirs but with certain features highlighted in the continuous-index setting.  相似文献   

11.
Summary Most existing methods for identifying aberrant regions with array CGH data are confined to a single target sample. Focusing on the comparison of multiple samples from two different groups, we develop a new penalized regression approach with a fused adaptive lasso penalty to accommodate the spatial dependence of the clones. The nonrandom aberrant genomic segments are determined by assessing the significance of the differences between neighboring clones and neighboring segments. The algorithm proposed in this article is a first attempt to simultaneously detect the common aberrant regions within each group, and the regions where the two groups differ in copy number changes. The simulation study suggests that the proposed procedure outperforms the commonly used single‐sample aberration detection methods for segmentation in terms of both false positives and false negatives. To further assess the value of the proposed method, we analyze a data set from a study that identified the aberrant genomic regions associated with grade subgroups of breast cancer tumors.  相似文献   

12.
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.  相似文献   

13.
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.  相似文献   

14.
Summary: Accurate estimation of DNA copy numbers from arraycomparative genomic hybridization (CGH) data is important forcharacterizing the cancer genome. An important part of thisprocess is the segmentation of the log-ratios between the sampleand control DNA along the chromosome into regions of differentcopy numbers. However, multiple algorithms are available inthe literature for this procedure and the results can vary substantiallyamong these. Thus, a visualization tool that can display thesegmented profiles from a number of methods can be helpful tothe biologist or the clinician to ascertain that a feature ofinterest did not arise as an artifact of the algorithm. Sucha tool also allows the methodologist to easily contrast hismethod against others. We developed a web-based tool that applies a number of popularalgorithms to a single array CGH profile entered by the user.It generates a heatmap panel of the segmented profiles for eachmethod as well as a consensus profile. The clickable heatmapcan be moved along the chromosome and zoomed in or out. It alsodisplays the time that each algorithm took and provides numericalvalues of the segmented profiles for download. The web interfacecalls algorithms written in the statistical language R. We encouragedevelopers of new algorithms to submit their routines to beincorporated into the website. Availability: http://compbio.med.harvard.edu/CGHweb Contact: peter_park{at}harvard.edu Associate Editor: Keith Crandall  相似文献   

15.
A method for calling gains and losses in array CGH data   总被引:11,自引:0,他引:11  
Array CGH is a powerful technique for genomic studies of cancer. It enables one to carry out genome-wide screening for regions of genetic alterations, such as chromosome gains and losses, or localized amplifications and deletions. In this paper, we propose a new algorithm 'Cluster along chromosomes' (CLAC) for the analysis of array CGH data. CLAC builds hierarchical clustering-style trees along each chromosome arm (or chromosome), and then selects the 'interesting' clusters by controlling the False Discovery Rate (FDR) at a certain level. In addition, it provides a consensus summary across a set of arrays, as well as an estimate of the corresponding FDR. We illustrate the method using an application of CLAC on a lung cancer microarray CGH data set as well as a BAC array CGH data set of aneuploid cell strains.  相似文献   

16.
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.  相似文献   

17.
Garnis C  Coe BP  Lam SL  MacAulay C  Lam WL 《Genomics》2005,85(6):790-793
Recent advances in array comparative genomic hybridization (array CGH) technology are revolutionizing our understanding of tumor genomes. Marker-based arrays enable rapid survey at megabase intervals, while tiling path arrays examine the entire genome in unprecedented detail. Tumor biopsies are typically small and contain infiltrating stromal cells, requiring tedious microdissection. Tissue heterogeneity is a major barrier to high-throughput profiling of tumor genomes and is also an important consideration for the introduction of array CGH to clinical settings. We propose that increasing array resolution will enhance detection sensitivity in mixed tissues and as a result significantly reduce microdissection requirements. In this study, we first simulated normal cell contamination to determine the heterogeneity tolerance of array CGH and then validated this detection sensitivity model on cancer specimens using the newly developed submegabase resolution tiling-set (SMRT) array, which spans the human genome with 32,433 overlapping BAC clones.  相似文献   

18.
CGHcall achieves high calling accuracy for array CGH data by effective use of breakpoint information from segmentation and by inclusion of several biological concepts that are ignored by existing algorithms. The algorithm is validated for simulated and verified real array CGH data. By incorporating more than three classes, CGHcall improves detection of single copy gains and amplifications. Moreover, it allows effective inclusion of chromosome arm information. AVAILABILITY: An R-package (GUI), a manual and an example data set are available at http://www.few.vu.nl/~mavdwiel/CGHcall.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
The development of high-throughput screening methods such as array-based comparative genome hybridization (array CGH) allows screening of the human genome for copy-number changes. Current array CGH strategies have limits of resolution that make detection of small (less than a few tens of kilobases) gains or losses of genomic DNA difficult to identify. We report here a significant improvement in the resolution of array CGH, with the development of an array platform that utilizes single-stranded DNA array elements to accurately measure copy-number changes of individual exons in the human genome. Using this technology, we screened 31 patient samples across an array containing a total of 162 exons for five disease genes and detected copy-number changes, ranging from whole-gene deletions and duplications to single-exon deletions and duplications, in 100% of the cases. Our data demonstrate that it is possible to screen the human genome for copy-number changes with array CGH at a resolution that is 2 orders of magnitude higher than that previously reported.  相似文献   

20.
Summary Expressed sequence tag (EST) sequencing is a one‐pass sequencing reading of cloned cDNAs derived from a certain tissue. The frequency of unique tags among different unbiased cDNA libraries is used to infer the relative expression level of each tag. In this article, we propose a hierarchical multinomial model with a nonlinear Dirichlet prior for the EST data with multiple libraries and multiple types of tissues. A novel hierarchical prior is developed and the properties of the proposed prior are examined. An efficient Markov chain Monte Carlo algorithm is developed for carrying out the posterior computation. We also propose a new selection criterion for detecting which genes are differentially expressed between two tissue types. Our new method with the new gene selection criterion is demonstrated via several simulations to have low false negative and false positive rates. A real EST data set is used to motivate and illustrate the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号