首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Copy number variations (CNVs) are being used as genetic markers or functional candidates in gene-mapping studies. However, unlike single nucleotide polymorphism or microsatellite genotyping techniques, most CNV detection methods are limited to detecting total copy numbers, rather than copy number in each of the two homologous chromosomes. To address this issue, we developed a statistical framework for intensity-based CNV detection platforms using family data. Our algorithm identifies CNVs for a family simultaneously, thus avoiding the generation of calls with Mendelian inconsistency while maintaining the ability to detect de novo CNVs. Applications to simulated data and real data indicate that our method significantly improves both call rates and accuracy of boundary inference, compared to existing approaches. We further illustrate the use of Mendelian inheritance to infer SNP allele compositions in each of the two homologous chromosomes in CNV regions using real data. Finally, we applied our method to a set of families genotyped using both the Illumina HumanHap550 and Affymetrix genome-wide 5.0 arrays to demonstrate its performance on both inherited and de novo CNVs. In conclusion, our method produces accurate CNV calls, gives probabilistic estimates of CNV transmission and builds a solid foundation for the development of linkage and association tests utilizing CNVs.  相似文献   

2.

Background

The advent of high throughput sequencing methods breeds an important amount of technical challenges. Among those is the one raised by the discovery of copy-number variations (CNVs) using whole-genome sequencing data. CNVs are genomic structural variations defined as a variation in the number of copies of a large genomic fragment, usually more than one kilobase. Here, we aim to compare different CNV calling methods in order to assess their ability to consistently identify CNVs by comparison of the calls in 9 quartets of identical twin pairs. The use of monozygotic twins provides a means of estimating the error rate of each algorithm by observing CNVs that are inconsistently called when considering the rules of Mendelian inheritance and the assumption of an identical genome between twins. The similarity between the calls from the different tools and the advantage of combining call sets were also considered.

Results

ERDS and CNVnator obtained the best performance when considering the inherited CNV rate with a mean of 0.74 and 0.70, respectively. Venn diagrams were generated to show the agreement between the different algorithms, before and after filtering out familial inconsistencies. This filtering revealed a high number of false positives for CNVer and Breakdancer. A low overall agreement between the methods suggested a high complementarity of the different tools when calling CNVs. The breakpoint sensitivity analysis indicated that CNVnator and ERDS achieved better resolution of CNV borders than the other tools. The highest inherited CNV rate was achieved through the intersection of these two tools (81%).

Conclusions

This study showed that ERDS and CNVnator provide good performance on whole genome sequencing data with respect to CNV consistency across families, CNV breakpoint resolution and CNV call specificity. The intersection of the calls from the two tools would be valuable for CNV genotyping pipelines.  相似文献   

3.
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.  相似文献   

4.

Background  

Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly assessing research and clinical samples for CNV content, as well as for determining the potential pathogenicity of identified variants. However, few informatics tools for accurate and efficient CNV detection and assessment currently exist.  相似文献   

5.

Background

The genetic contribution to sporadic amyotrophic lateral sclerosis (ALS) has not been fully elucidated. There are increasing efforts to characterise the role of copy number variants (CNVs) in human diseases; two previous studies concluded that CNVs may influence risk of sporadic ALS, with multiple rare CNVs more important than common CNVs. A little-explored issue surrounding genome-wide CNV association studies is that of post-calling filtering and merging of raw CNV calls. We undertook simulations to define filter thresholds and considered optimal ways of merging overlapping CNV calls for association testing, taking into consideration possibly overlapping or nested, but distinct, CNVs and boundary estimation uncertainty.

Methodology and Principal Findings

In this study we screened Illumina 300K SNP genotyping data from 730 ALS cases and 789 controls for copy number variation. Following quality control filters using thresholds defined by simulation, a total of 11321 CNV calls were made across 575 cases and 621 controls. Using region-based and gene-based association analyses, we identified several loci showing nominally significant association. However, the choice of criteria for combining calls for association testing has an impact on the ranking of the results by their significance. Several loci which were previously reported as being associated with ALS were identified here. However, of another 15 genes previously reported as exhibiting ALS-specific copy number variation, only four exhibited copy number variation in this study. Potentially interesting novel loci, including EEF1D, a translation elongation factor involved in the delivery of aminoacyl tRNAs to the ribosome (a process which has previously been implicated in genetic studies of spinal muscular atrophy) were identified but must be treated with caution due to concerns surrounding genomic location and platform suitability.

Conclusions and Significance

Interpretation of CNV association findings must take into account the effects of filtering and combining CNV calls when based on early genome-wide genotyping platforms and modest study sizes.  相似文献   

6.
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.  相似文献   

7.
The detection of copy number variants (CNV) by array-based platforms provides valuable insight into understanding human diversity. However, suboptimal study design and data processing negatively affect CNV assessment. We quantitatively evaluate their impact when short-sequence oligonucleotide arrays are applied (Affymetrix Genome-Wide Human SNP Array 6.0) by evaluating 42 HapMap samples for CNV detection. Several processing and segmentation strategies are implemented, and results are compared to CNV assessment obtained using an oligonucleotide array CGH platform designed to query CNVs at high resolution (Agilent). We quantitatively demonstrate that different reference models (e.g. single versus pooled sample reference) used to detect CNVs are a major source of inter-platform discrepancy (up to 30%) and that CNVs residing within segmental duplication regions (higher reference copy number) are significantly harder to detect (P < 0.0001). After adjusting Affymetrix data to mimic the Agilent experimental design (reference sample effect), we applied several common segmentation approaches and evaluated differential sensitivity and specificity for CNV detection, ranging 39–77% and 86–100% for non-segmental duplication regions, respectively, and 18–55% and 39–77% for segmental duplications. Our results are relevant to any array-based CNV study and provide guidelines to optimize performance based on study-specific objectives.  相似文献   

8.
The genetic basis of phenotypic variation can be partially explained by the presence of copy-number variations (CNVs). Currently available methods for CNV assessment include high-density single-nucleotide polymorphism (SNP) microarrays that have become an indispensable tool in genome-wide association studies (GWAS). However, insufficient concordance rates between different CNV assessment methods call for cautious interpretation of results from CNV-based genetic association studies. Here we provide a cross-population, microarray-based map of copy-number variant regions (CNVRs) to enable reliable interpretation of CNV association findings. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to scan the genomes of 1167 individuals from two ethnically distinct populations (Europe, N=717; Rwanda, N=450). Three different CNV-finding algorithms were tested and compared for sensitivity, specificity, and feasibility. Two algorithms were subsequently used to construct CNVR maps, which were also validated by processing subsamples with additional microarray platforms (Illumina 1M-Duo BeadChip, Nimblegen 385K aCGH array) and by comparing our data with publicly available information. Both algorithms detected a total of 42669 CNVs, 74% of which clustered in 385 CNVRs of a cross-population map. These CNVRs overlap with 862 annotated genes and account for approximately 3.3% of the haploid human genome.We created comprehensive cross-populational CNVR-maps. They represent an extendable framework that can leverage the detection of common CNVs and additionally assist in interpreting CNV-based association studies.  相似文献   

9.
Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385 000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM's performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of <10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2–5 kb and 1 kb, respectively.  相似文献   

10.

Background

Somatically acquired structure variations (SVs) and copy number variations (CNVs) can induce genetic changes that are directly related to tumor genesis. Somatic SV/CNV detection using next-generation sequencing (NGS) data still faces major challenges introduced by tumor sample characteristics, such as ploidy, heterogeneity, and purity. A simulated cancer genome with known SVs and CNVs can serve as a benchmark for evaluating the performance of existing somatic SV/CNV detection tools and developing new methods.

Results

SCNVSim is a tool for simulating somatic CNVs and structure variations SVs. Other than multiple types of SV and CNV events, the tool is capable of simulating important features related to tumor samples including aneuploidy, heterogeneity and purity.

Conclusions

SCNVSim generates the genomes of a cancer cell population with detailed information of copy number status, loss of heterozygosity (LOH), and event break points, which is essential for developing and evaluating somatic CNV and SV detection methods in cancer genomics studies.  相似文献   

11.

Background  

Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed.  相似文献   

12.
《Genomics》2020,112(2):1245-1256
Genetic laboratories use custom-commercial targeted next-generation sequencing (tg-NGS) assays to identify disease-causing variants. Although the high coverage achieved with these tests allows for the detection of copy number variants (CNVs), which account for an important proportion of the genetic burden in human diseases, an easy-to-use tool for automatic CNV detection is still lacking. This article presents a new CNV detection tool optimized for tg-NGS data: PattRec. PattRec was evaluated using a wide range of data, and its performance compared with those of other CNV detection tools. The software includes features for selecting optimal controls, discarding polymorphic CNVs prior to analysis, and filtering out deletions based on SNV zygosity, and automatically creates an in-house CNV database. There is no need for high level bioinformatic expertise and users can choose color-coded xlsx output that helps to prioritize potentially pathogenic CNVs. PattRec is presented as a Java based GUI, freely available online: https://github.com/irotero/PattRec.  相似文献   

13.
Copy number variation (CNV) is a major genetic polymorphism contributing to genetic diversity and human evolution. Clinical application of CNVs for diagnostic purposes largely depends on sufficient population CNV data for accurate interpretation. CNVs from general population in currently available databases help classify CNVs of uncertain clinical significance, and benign CNVs. Earlier studies of CNV distribution in several populations worldwide showed that a significant fraction of CNVs are population specific. In this study, we characterized and analyzed CNVs in 3,017 unrelated Thai individuals genotyped with the Illumina Human610, Illumina HumanOmniexpress, or Illumina HapMap550v3 platform. We employed hidden Markov model and circular binary segmentation methods to identify CNVs, extracted 23,458 CNVs consistently identified by both algorithms, and cataloged these high confident CNVs into our publicly available Thai CNV database. Analysis of CNVs in the Thai population identified a median of eight autosomal CNVs per individual. Most CNVs (96.73%) did not overlap with any known chromosomal imbalance syndromes documented in the DECIPHER database. When compared with CNVs in the 11 HapMap3 populations, CNVs found in the Thai population shared several characteristics with CNVs characterized in HapMap3. Common CNVs in Thais had similar frequencies to those in the HapMap3 populations, and all high frequency CNVs (>20%) found in Thai individuals could also be identified in HapMap3. The majorities of CNVs discovered in the Thai population, however, were of low frequency, or uniquely identified in Thais. When performing hierarchical clustering using CNV frequencies, the CNV data were clustered into Africans, Europeans, and Asians, in line with the clustering performed with single nucleotide polymorphism (SNP) data. As CNV data are specific to origin of population, our population-specific reference database will serve as a valuable addition to the existing resources for the investigation of clinical significance of CNVs in Thais and related ethnicities.  相似文献   

14.
Array CGH enables the detection of pathogenic copy number variants (CNVs) in 5–15% of individuals with intellectual disability (ID), making it a promising tool for uncovering ID candidate genes. However, most CNVs encompass multiple genes, making it difficult to identify key disease gene(s) underlying ID etiology. Using array CGH we identified 47 previously unreported unique CNVs in 45/255 probands. We prioritized ID candidate genes using five bioinformatic gene prioritization web tools. Gene priority lists were created by comparing integral genes from each CNV from our ID cohort with sets of training genes specific either to ID or randomly selected. Our findings suggest that different training sets alter gene prioritization only moderately; however, only the ID gene training set resulted in significant enrichment of genes with nervous system function (19%) in prioritized versus non-prioritized genes from the same de novo CNVs (7%, p < 0.05). This enrichment further increased to 31% when the five web tools were used in concert and included genes within mitogen-activated protein kinase (MAPK) and neuroactive ligand-receptor interaction pathways. Gene prioritization web tools enrich for genes with relevant function in ID and more readily facilitate the selection of ID candidate genes for functional studies, particularly for large CNVs.  相似文献   

15.
While numerous studies have implicated copy number variants (CNVs) in a range of neurological phenotypes, the impact relative to disease severity has been difficult to ascertain due to small sample sizes, lack of phenotypic details, and heterogeneity in platforms used for discovery. Using a customized microarray enriched for genomic hotspots, we assayed for large CNVs among 1,227 individuals with various neurological deficits including dyslexia (376), sporadic autism (350), and intellectual disability (ID) (501), as well as 337 controls. We show that the frequency of large CNVs (>1 Mbp) is significantly greater for ID-associated phenotypes compared to autism (p = 9.58 × 10(-11), odds ratio = 4.59), dyslexia (p = 3.81 × 10(-18), odds ratio = 14.45), or controls (p = 2.75 × 10(-17), odds ratio = 13.71). There is a striking difference in the frequency of rare CNVs (>50 kbp) in autism (10%, p = 2.4 × 10(-6), odds ratio = 6) or ID (16%, p = 3.55 × 10(-12), odds ratio = 10) compared to dyslexia (2%) with essentially no difference in large CNV burden among dyslexia patients compared to controls. Rare CNVs were more likely to arise de novo (64%) in ID when compared to autism (40%) or dyslexia (0%). We observed a significantly increased large CNV burden in individuals with ID and multiple congenital anomalies (MCA) compared to ID alone (p = 0.001, odds ratio = 2.54). Our data suggest that large CNV burden positively correlates with the severity of childhood disability: ID with MCA being most severely affected and dyslexics being indistinguishable from controls. When autism without ID was considered separately, the increase in CNV burden was modest compared to controls (p = 0.07, odds ratio = 2.33).  相似文献   

16.
Copy number variations (CNVs) are one of the main sources of variability in the human genome. Many CNVs are associated with various diseases including cardiovascular disease. In addition to hybridization-based methods, next-generation sequencing (NGS) technologies are increasingly used for CNV discovery. However, respective computational methods applicable to NGS data are still limited. We developed a novel CNV calling method based on outlier detection applicable to small cohorts, which is of particular interest for the discovery of individual CNVs within families, de novo CNVs in trios and/or small cohorts of specific phenotypes like rare diseases. Approximately 7,000 rare diseases are currently known, which collectively affect ∼6% of the population. For our method, we applied the Dixon’s Q test to detect outliers and used a Hidden Markov Model for their assessment. The method can be used for data obtained by exome and targeted resequencing. We evaluated our outlier- based method in comparison to the CNV calling tool CoNIFER using eight HapMap exome samples and subsequently applied both methods to targeted resequencing data of patients with Tetralogy of Fallot (TOF), the most common cyanotic congenital heart disease. In both the HapMap samples and the TOF cases, our method is superior to CoNIFER, such that it identifies more true positive CNVs. Called CNVs in TOF cases were validated by qPCR and HapMap CNVs were confirmed with available array-CGH data. In the TOF patients, we found four copy number gains affecting three genes, of which two are important regulators of heart development (NOTCH1, ISL1) and one is located in a region associated with cardiac malformations (PRODH at 22q11). In summary, we present a novel CNV calling method based on outlier detection, which will be of particular interest for the analysis of de novo or individual CNVs in trios or cohorts up to 30 individuals, respectively.  相似文献   

17.
《PloS one》2013,8(3)
Tourette syndrome (TS) is a neuropsychiatric disorder with a strong genetic component. However, the genetic architecture of TS remains uncertain. Copy number variation (CNV) has been shown to contribute to the genetic make-up of several neurodevelopmental conditions, including schizophrenia and autism. Here we describe CNV calls using SNP chip genotype data from an initial sample of 210 TS cases and 285 controls ascertained in two Latin American populations. After extensive quality control, we found that cases (N = 179) have a significant excess (P = 0.006) of large CNV (>500 kb) calls compared to controls (N = 234). Amongst 24 large CNVs seen only in the cases, we observed four duplications of the COL8A1 gene region. We also found two cases with ∼400kb deletions involving NRXN1, a gene previously implicated in neurodevelopmental disorders, including TS. Follow-up using multiplex ligation-dependent probe amplification (and including 53 more TS cases) validated the CNV calls and identified additional patients with rearrangements in COL8A1 and NRXN1, but none in controls. Examination of available parents indicates that two out of three NRXN1 deletions detected in the TS cases are de-novo mutations. Our results are consistent with the proposal that rare CNVs play a role in TS aetiology and suggest a possible role for rearrangements in the COL8A1 and NRXN1 gene regions.  相似文献   

18.
Several computer programs are available for detecting copy number variants (CNVs) using genome-wide SNP arrays. We evaluated the performance of four CNV detection software suites--Birdsuite, Partek, HelixTree, and PennCNV-Affy--in the identification of both rare and common CNVs. Each program's performance was assessed in two ways. The first was its recovery rate, i.e., its ability to call 893 CNVs previously identified in eight HapMap samples by paired-end sequencing of whole-genome fosmid clones, and 51,440 CNVs identified by array Comparative Genome Hybridization (aCGH) followed by validation procedures, in 90 HapMap CEU samples. The second evaluation was program performance calling rare and common CNVs in the Bipolar Genome Study (BiGS) data set (1001 bipolar cases and 1033 controls, all of European ancestry) as measured by the Affymetrix SNP 6.0 array. Accuracy in calling rare CNVs was assessed by positive predictive value, based on the proportion of rare CNVs validated by quantitative real-time PCR (qPCR), while accuracy in calling common CNVs was assessed by false positive/false negative rates based on qPCR validation results from a subset of common CNVs. Birdsuite recovered the highest percentages of known HapMap CNVs containing >20 markers in two reference CNV datasets. The recovery rate increased with decreased CNV frequency. In the tested rare CNV data, Birdsuite and Partek had higher positive predictive values than the other software suites. In a test of three common CNVs in the BiGS dataset, Birdsuite's call was 98.8% consistent with qPCR quantification in one CNV region, but the other two regions showed an unacceptable degree of accuracy. We found relatively poor consistency between the two "gold standards," the sequence data of Kidd et al., and aCGH data of Conrad et al. Algorithms for calling CNVs especially common ones need substantial improvement, and a "gold standard" for detection of CNVs remains to be established.  相似文献   

19.
Large rare copy number variants (CNVs) have been recognized as significant genetic risk factors for the development of schizophrenia (SCZ). However, due to their low frequency (1∶150 to 1∶1000) among patients, large sample sizes are needed to detect an association between specific CNVs and SCZ. So far, the majority of genome-wide CNV analyses have focused on reporting only CNVs that reached a significant P-value within the study cohort and merely confirmed the frequency of already-established risk-carrying CNVs. As a result, CNVs with a very low frequency that might be relevant for SCZ susceptibility are lost for secondary analyses. In this study, we provide a concise collection of high-quality CNVs in a large German sample consisting of 1,637 patients with SCZ or schizoaffective disorder and 1,627 controls. All individuals were genotyped on Illumina''s BeadChips and putative CNVs were identified using QuantiSNP and PennCNV. Only those CNVs that were detected by both programs and spanned ≥30 consecutive SNPs were included in the data collection and downstream analyses (2,366 CNVs, 0.73 CNVs per individual). The genome-wide analysis did not reveal a specific association between a previously unknown CNV and SCZ. However, the group of CNVs previously reported to be associated with SCZ was more frequent in our patients than in the controls. The publication of our dataset will serve as a unique, easily accessible, high-quality CNV data collection for other research groups. The dataset could be useful for the identification of new disease-relevant CNVs that are currently overlooked due to their very low frequency and lack of power for their detection in individual studies.  相似文献   

20.
《遗传学报》2021,48(12):1070-1080
Premenstrual dysphoric disorder (PMDD) affects nearly 5% of women of reproductive age. Symptomatic heterogeneity, together with largely unknown genetics, has greatly hindered its effective treatment. In the present study, analysis of genomic sequencing-based copy number variations (CNVs) called from 100 kb white blood cell DNA sequence windows by means of semisupervized clustering led to the segregation of patient genomes into the D and V groups, which correlated with the depression and invasion clinical types, respectively, with 89.0% consistency. Application of diagnostic CNV features selected using the correlation-based machine learning method enabled the classification of the CNVs obtained into the D group, V group, total patient group, and control group with an average accuracy of 83.0%. The power of the diagnostic CNV features was 0.98 on average, suggesting that these CNV features could be used for the molecular diagnosis of the major clinical types of PMDD. This demonstrated concordance between the CNV profiles and clinical types of PMDD supported the validity of symptom-based diagnosis of PMDD for differentiating between its two major clinical types, as well as the predominantly genetic nature of PMDD with a host of overlaps between multiple susceptibility genes/pathways and the diagnostic CNV features as indicators of involvement in PMDD etiology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号