首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ~2500 individuals by using Illumina SNP data, with an emphasis on “hotspots” prone to recurrent mutations. We find variants larger than 500 kb in 5%–10% of individuals and variants greater than 1 Mb in 1%–2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%–1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.  相似文献   

2.
Structural variation (SV) is a significant component of the genetic etiology of both neurodevelopmental and psychiatric disorders; however, routine guidelines for clinical genetic screening have been established only in the former category. Genome-wide chromosomal microarray (CMA) can detect genomic imbalances such as copy-number variants (CNVs), but balanced chromosomal abnormalities (BCAs) still require karyotyping for clinical detection. Moreover, submicroscopic BCAs and subarray threshold CNVs are intractable, or cryptic, to both CMA and karyotyping. Here, we performed whole-genome sequencing using large-insert jumping libraries to delineate both cytogenetically visible and cryptic SVs in a single test among 30 clinically referred youth representing a range of severe neuropsychiatric conditions. We detected 96 SVs per person on average that passed filtering criteria above our highest-confidence resolution (6,305 bp) and an additional 111 SVs per genome below this resolution. These SVs rearranged 3.8 Mb of genomic sequence and resulted in 42 putative loss-of-function (LoF) or gain-of-function mutations per person. We estimate that 80% of the LoF variants were cryptic to clinical CMA. We found myriad complex and cryptic rearrangements, including a “paired” duplication (360 kb, 169 kb) that flanks a 5.25 Mb inversion that appears in 7 additional cases from clinical CNV data among 47,562 individuals. Following convergent genomic profiling of these independent clinical CNV data, we interpreted three SVs to be of potential clinical significance. These data indicate that sequence-based delineation of the full SV mutational spectrum warrants exploration in youth referred for neuropsychiatric evaluation and clinical diagnostic SV screening more broadly.  相似文献   

3.
Despite considerable excitement over the potential functional significance of copy-number variants (CNVs), we still lack knowledge of the fine-scale architecture of the large majority of CNV regions in the human genome. In this study, we used a high-resolution array-based comparative genomic hybridization (aCGH) platform that targeted known CNV regions of the human genome at approximately 1 kb resolution to interrogate the genomic DNAs of 30 individuals from four HapMap populations. Our results revealed that 1020 of 1153 CNV loci (88%) were actually smaller in size than what is recorded in the Database of Genomic Variants based on previously published studies. A reduction in size of more than 50% was observed for 876 CNV regions (76%). We conclude that the total genomic content of currently known common human CNVs is likely smaller than previously thought. In addition, approximately 8% of the CNV regions observed in multiple individuals exhibited genomic architectural complexity in the form of smaller CNVs within larger ones and CNVs with interindividual variation in breakpoints. Future association studies that aim to capture the potential influences of CNVs on disease phenotypes will need to consider how to best ascertain this previously uncharacterized complexity.  相似文献   

4.
Copy number variation refers to regions along chromosomes that harbor a type of structural variation, such as duplications or deletions. Copy number variants (CNVs) play a role in many important traits as well as in genetic diversity. Previous analyses of chickens using array comparative genomic hybridizations or single‐nucleotide polymorphism chip assays have been performed on various breeds and genetic lines to discover CNVs. In this study, we assessed individuals from two highly inbred (inbreeding coefficiency > 99.99%) lines, Leghorn G‐B2 and Fayoumi M15.2, to discover novel CNVs in chickens. These lines have been previously studied for disease resistance, and to our knowledge, this represents the first global assessment of CNVs in the Fayoumi breed. Genomic DNA from individuals was examined using the Agilent chicken 244 K comparative genomic hybridization array and quantitative PCR. We identified a total of 273 CNVs overall, with 112 CNVs being novel and not previously reported. Quantitative PCR using the standard curve method validated a subset of our array data. Through enrichment analysis of genes within CNV regions, we observed multiple chromosomes, terms and pathways that were significantly enriched, largely dealing with the major histocompatibility complex and immune responsiveness. Using an additional round of computational and statistical analysis with a different bioinformatic pipeline, we identified 43 CNVs among these as high‐confidence regions, 14 of which were found to be novel. We further compared and contrasted individuals of the two inbred lines to discover regions that have a significant difference in copy number between lines. A total of 40 regions had significant deletions or duplications between the lines. Gene Ontology analysis of genomic regions containing CNVs between lines also was performed. This between‐line candidate CNV list will be useful in studies with these two unique genetic lines, which may harbor variations that underlie quantitative trait loci for disease resistance and other important traits. Through the global discovery of novel CNVs in chicken, these data also provide resources for further genetic and functional genomics studies.  相似文献   

5.
Acquisition of genetic material from viruses by their hosts can generate inter-host structural genome variation. We developed computational tools enabling us to study virus-derived structural variants (SVs) in population-scale whole genome sequencing (WGS) datasets and applied them to 3,332 humans. Although SVs had already been cataloged in these subjects, we found previously-overlooked virus-derived SVs. We detected non-germline SVs derived from squirrel monkey retrovirus (SMRV), human immunodeficiency virus 1 (HIV-1), and human T lymphotropic virus (HTLV-1); these variants are attributable to infection of the sequenced lymphoblastoid cell lines (LCLs) or their progenitor cells and may impact gene expression results and the biosafety of experiments using these cells. In addition, we detected new heritable SVs derived from human herpesvirus 6 (HHV-6) and human endogenous retrovirus-K (HERV-K). We report the first solo-direct repeat (DR) HHV-6 likely to reflect DR rearrangement of a known full-length endogenous HHV-6. We used linkage disequilibrium between single nucleotide variants (SNVs) and variants in reads that align to HERV-K, which often cannot be mapped uniquely using conventional short-read sequencing analysis methods, to locate previously-unknown polymorphic HERV-K loci. Some of these loci are tightly linked to trait-associated SNVs, some are in complex genome regions inaccessible by prior methods, and some contain novel HERV-K haplotypes likely derived from gene conversion from an unknown source or introgression. These tools and results broaden our perspective on the coevolution between viruses and humans, including ongoing virus-to-human gene transfer contributing to genetic variation between humans.  相似文献   

6.
7.
Copy number variations (CNVs) constitute an important class of variation in the human genome and the interpretation of their pathogenicity considering different frequencies across populations is still a challenge for geneticists. Since the CNV databases are predominantly composed of European and non-admixed individuals, and Brazilian genetic constitution is admixed and ethnically diverse, diagnostic screenings on Brazilian variants are greatly difficulted by the lack of populational references. We analyzed a clinical sample of 268 Brazilian individuals, including patients with neurodevelopment disorders and/or congenital malformations. The pathogenicity of CNVs was classified according to their gene content and overlap with known benign and pathogenic variants. A total of 1,504 autosomal CNVs (1,207 gains and 297 losses) were classified as benign (92.9%), likely benign (1.6%), VUS (2.6%), likely pathogenic (0.2%) and pathogenic (2.7%). Some of the CNVs were recurrent and with frequency increased in our sample, when compared to populational open resources of structural variants: 14q32.33, 22q11.22, 1q21.1, and 1p36.32 gains. Thus, these highly recurrent CNVs classified as likely benign or VUS were considered non-pathogenic in our Brazilian sample. This study shows the relevance of introducing CNV data from diverse cohorts to improve on the interpretation of clinical impact of genomic variations.  相似文献   

8.
Extensive copy-number variation of the human olfactory receptor gene family   总被引:3,自引:0,他引:3  
As much as a quarter of the human genome has been reported to vary in copy number between individuals, including regions containing about half of the members of the olfactory receptor (OR) gene family. We have undertaken a detailed study of copy-number variation of ORs to elucidate the selective and mechanistic forces acting on this gene family and the true impact of copy-number variation on human OR repertoires. We argue that the properties of copy-number variants (CNVs) and other sets of large genomic regions violate the assumptions of statistical methods that are commonly used in the assessment of gene enrichment. Using more appropriate methods, we provide evidence that OR enrichment in CNVs is not due to positive selection but is because of OR preponderance in segmentally duplicated regions, which are known to be frequently copy-number variable, and because purifying selection against CNVs is lower in OR-containing regions than in regions containing essential genes. We also combine multiplex ligation-dependent probe amplification (MLPA) and PCR to assay the copy numbers of 37 candidate CNV ORs in a panel of ~50 human individuals. We confirm copy-number variation of 18 ORs but find no variation in this human-diversity panel for 16 other ORs, highlighting the caveat that reported intervals often overrepresent true CNVs. The copy-number variation we describe is likely to underpin significant variation in olfactory abilities among human individuals. Finally, we show that both homology-based and homology-independent processes have played a recent role in remodeling the OR family.  相似文献   

9.
DNA variants, such as single nucleotide polymorphisms (SNPs) and copy number variants (CNVs), are unevenly distributed across the human genome. Currently, dbSNP contains more than 6 million human SNPs, and whole-genome genotyping arrays can assay more than 4 million of them simultaneously. In our study, we first questioned whether published genome-wide association studies (GWASs) assays cover all regions well in the genome. Using dbSNP build 135 data, we identified 50 genomic regions longer than 100 Kb that do not contain any common SNPs, i.e., those with minor allele frequency (MAF)≥1%. Secondly, because conserved regions are generally of functional importance, we tested genes in those large genomic regions without common SNPs. We found 97 genes and were enriched for reproduction function. In addition, we further filtered out regions with CNVs listed in the Database of Genomic Variants (DGV), segmental duplications from Human Genome Project and common variants identified by personal genome sequencing (UCSC). No region survived after those filtering. Our analysis suggests that, while there may not be many large genomic regions free of common variants, there are still some “holes” in the current human genomic map for common SNPs. Because GWAS only focused on common SNPs, interpretation of GWAS results should take this limitation into account. Particularly, two recent GWAS of fertility may be incomplete due to the map deficit. Additional SNP discovery efforts should pay close attention to these regions.  相似文献   

10.
Array-based methods have enabled the detection of many genomic gains and losses. These are stated as copy number variants (CNVs) and comprise up to 13% of the human genome. Based on their breakpoints and modes of formation CNVs are termed recurrent or nonrecurrent. Recurrent CNVs are flanked by low copy repeats and are of a fixed size. They arise as a result of misalignment during meiosis by a mechanism named nonallelic homologous recombination. Several of such recurrent CNVs have been linked to human diseases. Nonrecurrent CNVs, which are not flanked by low copy repeats, are of variable size and may arise via mechanisms like nonhomologous end joining and replication-based mechanisms described by the fork stalling and template switching and microhomology-mediated break-induced replication models. It is becoming clear that most disease-causing CNVs are nonrecurrent and generally arise via replication-based mechanisms. Furthermore, it is now appreciated that genomic features other than low copy repeats play a role in the formation of nonrecurrent CNVs. This review will discuss the different mechanisms of CNV formation and how high resolution analyses of CNV breakpoints have added to our knowledge of their precise structure.  相似文献   

11.
Structural variation is thought to play a major etiological role in the development of autism spectrum disorders (ASDs), and numerous studies documenting the relevance of copy number variants (CNVs) in ASD have been published since 2006. To determine if large ASD families harbor high-impact CNVs that may have broader impact in the general ASD population, we used the Affymetrix genome-wide human SNP array 6.0 to identify 153 putative autism-specific CNVs present in 55 individuals with ASD from 9 multiplex ASD pedigrees. To evaluate the actual prevalence of these CNVs as well as 185 CNVs reportedly associated with ASD from published studies many of which are insufficiently powered, we designed a custom Illumina array and used it to interrogate these CNVs in 3,000 ASD cases and 6,000 controls. Additional single nucleotide variants (SNVs) on the array identified 25 CNVs that we did not detect in our family studies at the standard SNP array resolution. After molecular validation, our results demonstrated that 15 CNVs identified in high-risk ASD families also were found in two or more ASD cases with odds ratios greater than 2.0, strengthening their support as ASD risk variants. In addition, of the 25 CNVs identified using SNV probes on our custom array, 9 also had odds ratios greater than 2.0, suggesting that these CNVs also are ASD risk variants. Eighteen of the validated CNVs have not been reported previously in individuals with ASD and three have only been observed once. Finally, we confirmed the association of 31 of 185 published ASD-associated CNVs in our dataset with odds ratios greater than 2.0, suggesting they may be of clinical relevance in the evaluation of children with ASDs. Taken together, these data provide strong support for the existence and application of high-impact CNVs in the clinical genetic evaluation of children with ASD.  相似文献   

12.
Copy number variations (CNVs) are one of the main sources of variability in the human genome. Many CNVs are associated with various diseases including cardiovascular disease. In addition to hybridization-based methods, next-generation sequencing (NGS) technologies are increasingly used for CNV discovery. However, respective computational methods applicable to NGS data are still limited. We developed a novel CNV calling method based on outlier detection applicable to small cohorts, which is of particular interest for the discovery of individual CNVs within families, de novo CNVs in trios and/or small cohorts of specific phenotypes like rare diseases. Approximately 7,000 rare diseases are currently known, which collectively affect ∼6% of the population. For our method, we applied the Dixon’s Q test to detect outliers and used a Hidden Markov Model for their assessment. The method can be used for data obtained by exome and targeted resequencing. We evaluated our outlier- based method in comparison to the CNV calling tool CoNIFER using eight HapMap exome samples and subsequently applied both methods to targeted resequencing data of patients with Tetralogy of Fallot (TOF), the most common cyanotic congenital heart disease. In both the HapMap samples and the TOF cases, our method is superior to CoNIFER, such that it identifies more true positive CNVs. Called CNVs in TOF cases were validated by qPCR and HapMap CNVs were confirmed with available array-CGH data. In the TOF patients, we found four copy number gains affecting three genes, of which two are important regulators of heart development (NOTCH1, ISL1) and one is located in a region associated with cardiac malformations (PRODH at 22q11). In summary, we present a novel CNV calling method based on outlier detection, which will be of particular interest for the analysis of de novo or individual CNVs in trios or cohorts up to 30 individuals, respectively.  相似文献   

13.
全基因组测序及其在遗传性疾病研究及诊断中的应用   总被引:1,自引:0,他引:1  
邵谦之  姜毅  吴金雨 《遗传》2014,36(11):1087-1098
最近,随着测序成本的不断降低,数据分析策略的不断提升,全基因组测序(whole-genome sequencing,WGS)已经在癌症、孟德尔遗传病、复杂疾病的致病基因检测中得到了一定运用,并逐步走向了临床诊断。全基因组测序不但可以检测编码区和非编码区的点突变(SNVs)和插入缺失(InDels),还可以在全基因组范围内检测拷贝数变异(copy number variation,CNV)以及结构变异(structure variation,SV)。本文详细地介绍了全基因组测序的标准生物信息分析流程与方法,及其在疾病研究、临床诊断中的应用,并对全基因组测序在医学遗传学中的应用与研究进展,以及数据分析方面面临的挑战进行了概述。  相似文献   

14.
Differences in genomic structure between individuals are ubiquitous features of human genetic variation. Specific copy number variants (CNVs) have been associated with susceptibility to numerous complex psychiatric disorders, including attention-deficit-hyperactivity disorder, autism-spectrum disorders and schizophrenia. These disorders often display co-morbidity with low intelligence. Rare chromosomal deletions and duplications are associated with these disorders, so it has been suggested that these deletions or duplications may be associated with differences in intelligence. Here we investigate associations between large (≥500kb), rare (<1% population frequency) CNVs and both fluid and crystallized intelligence in community-dwelling older people. We observe no significant associations between intelligence and total CNV load. Examining individual CNV regions previously implicated in neuropsychological disorders, we find suggestive evidence that CNV regions around SHANK3 are associated with fluid intelligence as derived from a battery of cognitive tests. This is the first study to examine the effects of rare CNVs as called by multiple algorithms on cognition in a large non-clinical sample, and finds no effects of such variants on general cognitive ability.  相似文献   

15.
Although there are many methods available for inferring copy-number variants (CNVs) from next-generation sequence data, there remains a need for a system that is computationally efficient but that retains good sensitivity and specificity across all types of CNVs. Here, we introduce a new method, estimation by read depth with single-nucleotide variants (ERDS), and use various approaches to compare its performance to other methods. We found that for common CNVs and high-coverage genomes, ERDS performs as well as the best method currently available (Genome STRiP), whereas for rare CNVs and high-coverage genomes, ERDS performs better than any available method. Importantly, ERDS accommodates both unique and highly amplified regions of the genome and does so without requiring separate alignments for calling CNVs and other variants. These comparisons show that for genomes sequenced at high coverage, ERDS provides a computationally convenient method that calls CNVs as well as or better than any currently available method.  相似文献   

16.
Nonallelic homologous recombination (NAHR), occurring between low-copy repeats (LCRs) >10 kb in size and sharing >97% DNA sequence identity, is responsible for the majority of recurrent genomic rearrangements in the human genome. Recent studies have shown that transposable elements (TEs) can also mediate recurrent deletions and translocations, indicating the features of substrates that mediate NAHR may be significantly less stringent than previously believed. Using >4 kb length and >95% sequence identity criteria, we analyzed of the genome-wide distribution of long interspersed element (LINE) retrotransposon and their potential to mediate NAHR. We identified 17 005 directly oriented LINE pairs located <10 Mbp from each other as potential NAHR substrates, placing 82.8% of the human genome at risk of LINE–LINE-mediated instability. Cross-referencing these regions with CNVs in the Baylor College of Medicine clinical chromosomal microarray database of 36 285 patients, we identified 516 CNVs potentially mediated by LINEs. Using long-range PCR of five different genomic regions in a total of 44 patients, we confirmed that the CNV breakpoints in each patient map within the LINE elements. To additionally assess the scale of LINE–LINE/NAHR phenomenon in the human genome, we tested DNA samples from six healthy individuals on a custom aCGH microarray targeting LINE elements predicted to mediate CNVs and identified 25 LINE–LINE rearrangements. Our data indicate that LINE–LINE-mediated NAHR is widespread and under-recognized, and is an important mechanism of structural rearrangement contributing to human genomic variability.  相似文献   

17.
Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1-23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation.  相似文献   

18.
19.
ABSTRACT: BACKGROUND: An important question in genetic studies is to determine those genetic variants, in particular CNVs, that arespecific to different groups of individuals. This could help in elucidating differences in disease predispositionand response to pharmaceutical treatments. We propose a Bayesian model designed to analyze thousands of copynumber variants (CNVs) where only few of them are expected to be associated with a specific phenotype. RESULTS: The model is illustrated by analyzing three major human groups belonging to HapMap data. We also show howthe model can be used to determine specific CNVs related to response to treatment in patients diagnosed withovarian cancer. The model is also extended to address the problem of how to adjust for confounding covariates(e.g., population stratification). Through a simulation study, we show that the proposed model outperforms otherapproaches that are typically used to analyze this data when analyzing common copy-number polymorphisms(CNPs) or complex CNVs. We have developed an R package, called bayesGen, that implements the model andestimating algorithms. CONCLUSIONS: Our proposed model is useful to discover specific genetic variants when different subgroups of individuals areanalyzed. The model can address studies with or without control group. By integrating all data in a unique modelwe can obtain a list of genes that are associated with a given phenotype as well as a different list of genes that areshared among the different subtypes of cases.  相似文献   

20.
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号