首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) that are associated with the development and behavior of tumors. Advances in microarray technology have allowed for greater resolution in detection of DNA copy number changes (amplifications or deletions) across the genome. However, the increase in number of measured signals and accompanying noise from the array probes present a challenge in accurate and fast identification of breakpoints that define CNA. This article proposes a novel detection technique that exploits the use of piece wise constant (PWC) vectors to represent genome copy number and sparse Bayesian learning (SBL) to detect CNA breakpoints. METHODS: First, a compact linear algebra representation for the genome copy number is developed from normalized probe intensities. Second, SBL is applied and optimized to infer locations where copy number changes occur. Third, a backward elimination (BE) procedure is used to rank the inferred breakpoints; and a cut-off point can be efficiently adjusted in this procedure to control for the false discovery rate (FDR). RESULTS: The performance of our algorithm is evaluated using simulated and real genome datasets and compared to other existing techniques. Our approach achieves the highest accuracy and lowest FDR while improving computational speed by several orders of magnitude. The proposed algorithm has been developed into a free standing software application (GADA, Genome Alteration Detection Algorithm). AVAILABILITY: http://biron.usc.edu/~piquereg/GADA  相似文献   

2.

Background

Genomic deletions and duplications are important in the pathogenesis of diseases, such as cancer and mental retardation, and have recently been shown to occur frequently in unaffected individuals as polymorphisms. Affymetrix GeneChip whole genome sampling analysis (WGSA) combined with 100 K single nucleotide polymorphism (SNP) genotyping arrays is one of several microarray-based approaches that are now being used to detect such structural genomic changes. The popularity of this technology and its associated open source data format have resulted in the development of an increasing number of software packages for the analysis of copy number changes using these SNP arrays.

Results

We evaluated four publicly available software packages for high throughput copy number analysis using synthetic and empirical 100 K SNP array data sets, the latter obtained from 107 mental retardation (MR) patients and their unaffected parents and siblings. We evaluated the software with regards to overall suitability for high-throughput 100 K SNP array data analysis, as well as effectiveness of normalization, scaling with various reference sets and feature extraction, as well as true and false positive rates of genomic copy number variant (CNV) detection.

Conclusion

We observed considerable variation among the numbers and types of candidate CNVs detected by different analysis approaches, and found that multiple programs were needed to find all real aberrations in our test set. The frequency of false positive deletions was substantial, but could be greatly reduced by using the SNP genotype information to confirm loss of heterozygosity.  相似文献   

3.
Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.  相似文献   

4.
Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385 000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM's performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of <10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2–5 kb and 1 kb, respectively.  相似文献   

5.
Genomic copy number variations (CNVs) are considered as a significant source of genetic diversity and widely involved in gene expression and regulatory mechanism, genetic disorders and disease risk, susceptibility to certain diseases and conditions, and resistance to medical drugs. Many studies have targeted the identification, profiling, analysis, and associations of genetic CNVs. We propose herein two new fuzzy methods, taht is, one based on the fuzzy inference from the pre-processed input, and another based on fuzzy C-means clustering. Our solutions present a higher true positive rate and a lower false negative with no false positive, efficient performance and consumption of least resources.  相似文献   

6.
7.

Background  

Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number changes extend over larger chromosomal regions affecting the expression levels of multiple resident genes.  相似文献   

8.
9.

Background  

Recent advances in sequencing technologies have enabled generation of large-scale genome sequencing data. These data can be used to characterize a variety of genomic features, including the DNA copy number profile of a cancer genome. A robust and reliable method for screening chromosomal alterations would allow a detailed characterization of the cancer genome with unprecedented accuracy.  相似文献   

10.
Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected.  相似文献   

11.
ABSTRACT: BACKGROUND: An important question in genetic studies is to determine those genetic variants, in particular CNVs, that arespecific to different groups of individuals. This could help in elucidating differences in disease predispositionand response to pharmaceutical treatments. We propose a Bayesian model designed to analyze thousands of copynumber variants (CNVs) where only few of them are expected to be associated with a specific phenotype. RESULTS: The model is illustrated by analyzing three major human groups belonging to HapMap data. We also show howthe model can be used to determine specific CNVs related to response to treatment in patients diagnosed withovarian cancer. The model is also extended to address the problem of how to adjust for confounding covariates(e.g., population stratification). Through a simulation study, we show that the proposed model outperforms otherapproaches that are typically used to analyze this data when analyzing common copy-number polymorphisms(CNPs) or complex CNVs. We have developed an R package, called bayesGen, that implements the model andestimating algorithms. CONCLUSIONS: Our proposed model is useful to discover specific genetic variants when different subgroups of individuals areanalyzed. The model can address studies with or without control group. By integrating all data in a unique modelwe can obtain a list of genes that are associated with a given phenotype as well as a different list of genes that areshared among the different subtypes of cases.  相似文献   

12.
The variability of human populations in a large part is determined by two complementary factors: environment and genetic information. Genetic variation is caused by different genetic variants (polymorphisms and mutations) present in the human genome. Until recently it was thought that most of these variants are small changes of one or several nucleotides (SNPs) which in their millions are present in the human genome. However, it was recently shown that there are also polymorphisms that extend over hundreds of thousands of DNA base pairs in the human genome. Such alternations called copy number variation (CNV) often include genes and other functional genetic elements. In this article we present the general characteristics of copy number polymorphism and we discuss some examples of CNVs that influence human phenotypes.  相似文献   

13.

Background

Copy number variations (CNVs) of chromosomal region 22q11.2 are associated with a subset of patients with congenital heart disease (CHD). Accurate and efficient detection of CNV is important for genetic analysis of CHD. The aim of the study was to introduce a novel approach named CNVplex®, a high-throughput analysis technique designed for efficient detection of chromosomal CNVs, and to explore the prevalence of sub-chromosomal imbalances in 22q11.2 loci in patients with CHD from a single institute.

Results

We developed a novel technique, CNVplex®, for high-throughput detection of sub-chromosomal copy number aberrations. Modified from the multiplex ligation-dependent probe amplification (MLPA) method, it introduced a lengthening ligation system and four universal primer sets, which simplified the synthesis of probes and significantly improved the flexibility of the experiment. We used 110 samples, which were extensively characterized with chromosomal microarray analysis and MLPA, to validate the performance of the newly developed method. Furthermore, CNVplex® was used to screen for sub-chromosomal imbalances in 22q11.2 loci in 818 CHD patients consecutively enrolled from Shanghai Children’s Medical Center. In the methodology development phase, CNVplex® detected all copy number aberrations that were previously identified with both chromosomal microarray analysis and MLPA, demonstrating 100% sensitivity and specificity. In the validation phase, 22q11.2 deletion and 22q11.2 duplication were detected in 39 and 1 of 818 patients with CHD by CNVplex®, respectively. Our data demonstrated that the frequency of 22q11.2 deletion varied among sub-groups of CHD patients. Notably, 22q11.2 deletion was more commonly observed in cases with conotruncal defect (CTD) than in cases with non-CTD (P < 0.001). With higher resolution and more probes against selected chromosomal loci, CNVplex® also identified several individuals with small CNVs and alterations in other chromosomes.

Conclusions

CNVplex® is sensitive and specific in its detection of CNVs, and it is an alternative to MLPA for batch screening of pathogenetic CNVs in known genomic loci.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1590-5) contains supplementary material, which is available to authorized users.  相似文献   

14.
15.
Copy number variation is common in the human genome with many regions, overlapping thousands of genes, now known to be deleted or amplified. Aneuploidies and other forms of chromosomal imbalance have a wide range of adverse phenotypes and are a common cause of birth defects resulting in significant morbidity and mortality. “Normal” copy number variants (CNVs) embedded within the regions of chromosome imbalance may affect the clinical outcomes by altering the local copy number of important genes or regulatory regions: this could alleviate or exacerbate certain phenotypes. In this way CNVs may contribute to the clinical variability seen in many disorders caused by chromosomal abnormalities, such as the congenital heart defects (CHD) seen in ~40% of Down’s syndrome (DS) patients. Investigation of CNVs may therefore help to pinpoint critical genes or regulatory elements, elucidating the molecular mechanisms underlying these conditions, also shedding light on the aetiology of such phenotypes in people without major chromosome imbalances, and ultimately leading to their improved detection and treatment.  相似文献   

16.
<正>Human genetic variants have long been known to play an important role in both Mendelian disorders and common diseases.Notably,pathogenic variants are not limited to single-nucleotide variants.It has become apparent that human diseases can also be caLused by copy number variations(CNVs),especially patientspecific novel CNVs(Iafrate et al.M 2004;Sebat et al.,2004;Redon  相似文献   

17.
In humans, copy number variations (CNVs) are a common source of phenotypic diversity and disease susceptibility. Facioscapulohumeral muscular dystrophy (FSHD) is an important genetic disease caused by CNVs. It is an autosomal-dominant myopathy caused by a reduction in the copy number of the D4Z4 macrosatellite repeat located at chromosome 4q35. Interestingly, the reduction of D4Z4 copy number is not sufficient by itself to cause FSHD. A number of epigenetic events appear to affect the severity of the disease, its rate of progression, and the distribution of muscle weakness. Indeed, recent findings suggest that virtually all levels of epigenetic regulation, from DNA methylation to higher order chromosomal architecture, are altered at the disease locus, causing the de-regulation of 4q35 gene expression and ultimately FSHD.  相似文献   

18.
Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq  相似文献   

19.
DNA sequence copy number is the number of copies of DNA at a region of a genome. Cancer progression often involves alterations in DNA copy number. Newly developed microarray technologies enable simultaneous measurement of copy number at thousands of sites in a genome. We have developed a modification of binary segmentation, which we call circular binary segmentation, to translate noisy intensity measurements into regions of equal copy number. The method is evaluated by simulation and is demonstrated on cell line data with known copy number alterations and on a breast cancer cell line data set.  相似文献   

20.
Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号