首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95-99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ~15% and ~20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing.  相似文献   

2.
The development of high-throughput screening methods such as array-based comparative genome hybridization (array CGH) allows screening of the human genome for copy-number changes. Current array CGH strategies have limits of resolution that make detection of small (less than a few tens of kilobases) gains or losses of genomic DNA difficult to identify. We report here a significant improvement in the resolution of array CGH, with the development of an array platform that utilizes single-stranded DNA array elements to accurately measure copy-number changes of individual exons in the human genome. Using this technology, we screened 31 patient samples across an array containing a total of 162 exons for five disease genes and detected copy-number changes, ranging from whole-gene deletions and duplications to single-exon deletions and duplications, in 100% of the cases. Our data demonstrate that it is possible to screen the human genome for copy-number changes with array CGH at a resolution that is 2 orders of magnitude higher than that previously reported.  相似文献   

3.
The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1,058-1,808 heterozygous single-nucleotide variants (SNVs), but no copy-number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction.  相似文献   

4.
Extensive copy-number variation of the human olfactory receptor gene family   总被引:3,自引:0,他引:3  
As much as a quarter of the human genome has been reported to vary in copy number between individuals, including regions containing about half of the members of the olfactory receptor (OR) gene family. We have undertaken a detailed study of copy-number variation of ORs to elucidate the selective and mechanistic forces acting on this gene family and the true impact of copy-number variation on human OR repertoires. We argue that the properties of copy-number variants (CNVs) and other sets of large genomic regions violate the assumptions of statistical methods that are commonly used in the assessment of gene enrichment. Using more appropriate methods, we provide evidence that OR enrichment in CNVs is not due to positive selection but is because of OR preponderance in segmentally duplicated regions, which are known to be frequently copy-number variable, and because purifying selection against CNVs is lower in OR-containing regions than in regions containing essential genes. We also combine multiplex ligation-dependent probe amplification (MLPA) and PCR to assay the copy numbers of 37 candidate CNV ORs in a panel of ~50 human individuals. We confirm copy-number variation of 18 ORs but find no variation in this human-diversity panel for 16 other ORs, highlighting the caveat that reported intervals often overrepresent true CNVs. The copy-number variation we describe is likely to underpin significant variation in olfactory abilities among human individuals. Finally, we show that both homology-based and homology-independent processes have played a recent role in remodeling the OR family.  相似文献   

5.
Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population.  相似文献   

6.
Differences between individuals in the copy-number of whole genes have been found in every multicellular species examined thus far. Such differences result in unique complements of protein-coding genes in all individuals, and have been shown to underlie adaptive phenotypic differences. Here, we review the evidence for copy-number variants (CNVs), focusing on the methods used to detect them and the molecular mechanisms responsible for generating this type of variation. Although there are multiple technical and computational challenges inherent to these experimental methods, next-generation sequencing technologies are making such experiments accessible in any system with a sequenced genome. We further discuss the connection between copy-number variation within species and copy-number divergence between species, showing that these values are exactly what one would expect from similar comparisons of nucleotide polymorphism and divergence. We conclude by reviewing the growing body of evidence for natural selection on copy-number variants. While it appears that most genic CNVs—especially deletions—are quickly eliminated by selection, there are now multiple studies demonstrating a strong link between copy-number differences at specific genes and phenotypic differences in adaptive traits. We argue that a complete understanding of the molecular basis for adaptive natural selection necessarily includes the study of copy-number variation.  相似文献   

7.
Over recent years small submicroscopic DNA copy-number variants (CNVs) have been highlighted as an important source of variation in the human genome, human phenotypic diversity and disease susceptibility. Consequently, there is a pressing need for the development of methods that allow the efficient, accurate and cheap measurement of genomic copy number polymorphisms in clinical cohorts. We have developed a simple competitive PCR based method to determine DNA copy number which uses the entire genome of a single chimpanzee as a competitor thus eliminating the requirement for competitive sequences to be synthesized for each assay. This results in the requirement for only a single reference sample for all assays and dramatically increases the potential for large numbers of loci to be analysed in multiplex. In this study we establish proof of concept by accurately detecting previously characterized mutations at the PARK2 locus and then demonstrating the potential of quantitative interspecies competitive PCR (qicPCR) to accurately genotype CNVs in association studies by analysing chromosome 22q11 deletions in a sample of previously characterized patients and normal controls.  相似文献   

8.
The cause of mental retardation in one-third to one-half of all affected individuals is unknown. Microscopically detectable chromosomal abnormalities are the most frequently recognized cause, but gain or loss of chromosomal segments that are too small to be seen by conventional cytogenetic analysis has been found to be another important cause. Array-based methods offer a practical means of performing a high-resolution survey of the entire genome for submicroscopic copy-number variants. We studied 100 children with idiopathic mental retardation and normal results of standard chromosomal analysis, by use of whole-genome sampling analysis with Affymetrix GeneChip Human Mapping 100K arrays. We found de novo deletions as small as 178 kb in eight cases, de novo duplications as small as 1.1 Mb in two cases, and unsuspected mosaic trisomy 9 in another case. This technology can detect at least twice as many potentially pathogenic de novo copy-number variants as conventional cytogenetic analysis can in people with mental retardation.  相似文献   

9.
Methods for haplotyping and DNA copy-number typing of single cells are paramount for studying genomic heterogeneity and enabling genetic diagnosis. Before analyzing the DNA of a single cell by microarray or next-generation sequencing, a whole-genome amplification (WGA) process is required, but it substantially distorts the frequency and composition of the cell’s alleles. As a consequence, haplotyping methods suffer from error-prone discrete SNP genotypes (AA, AB, BB) and DNA copy-number profiling remains difficult because true DNA copy-number aberrations have to be discriminated from WGA artifacts. Here, we developed a single-cell genome analysis method that reconstructs genome-wide haplotype architectures as well as the copy-number and segregational origin of those haplotypes by employing phased parental genotypes and deciphering WGA-distorted SNP B-allele fractions via a process we coin haplarithmisis. We demonstrate that the method can be applied as a generic method for preimplantation genetic diagnosis on single cells biopsied from human embryos, enabling diagnosis of disease alleles genome wide as well as numerical and structural chromosomal anomalies. Moreover, meiotic segregation errors can be distinguished from mitotic ones.  相似文献   

10.
An evolution revolution provides further revelation   总被引:1,自引:0,他引:1  
The extent of copy-number variation (CNV) in the human genome has been appreciated only recently. Nevertheless, for almost four decades, gene duplication has been a prevailing hypothesis for evolutionary change. Recently, gene CNV spanning 60 million years of human and primate evolution has been determined enabling lineage-specific gene CNV to be identified. Primate lineage-specific gene CNV studies reveal that almost one third of all human genes exhibit a copy-number change in one or more primate species. Intriguingly, human lineage-specific gene amplification can be correlated to the emergence of human-specific traits such as cognition and endurance running.  相似文献   

11.
Recent studies have extensively examined the large-scale genetic variants in the human genome known as copy-number variations (CNVs), and the universality of CNVs in normal individuals, along with their functional importance, has been increasingly recognized. However, the absence of a method to accurately infer alleles or haplotypes within a CNV region from high-throughput experimental data hampers the finer analyses of CNV properties and applications to disease-association studies. Here we developed an algorithm to infer complex haplotypes within a CNV region by using data obtained from high-throughput experimental platforms. We applied this algorithm to experimental data and estimated the population frequencies of haplotypes that can yield information on both sequences and numbers of DNA copies. These results suggested that the analysis of such complex haplotypes is essential for accurately detecting genetic differences within a CNV region between population groups.  相似文献   

12.
Array comparative genomic hybridization (aCGH) has been used primarily to detect copy-number variants between two genomes. Here we report using aCGH to detect single nucleotide mutations on oligonucleotide microarrays with overlapping 50-mer probes. This technique represents a powerful method for rapidly detecting novel homozygous single nucleotide mutations in any organism with a sequenced reference genome.  相似文献   

13.
ABSTRACT: BACKGROUND: Eimeria is a genus of parasites in the same phylum (Apicomplexa) as human parasites such as Toxoplasma, Cryptosporidium and the malaria parasite Plasmodium. As an apicomplexan whose life-cycle involves a single host, Eimeria is a convenient model for understanding this group of organisms. Although the genomes of the Apicomplexa are diverse, that of Eimeria is unique in being composed of large alternating blocks of sequence with very different characteristics - an arrangement seen in no other organism. This arrangement has impeded efforts to fully sequence the genome of Eimeria, which remains the last of the major apicomplexans to be fully analyzed. In order to increase the value of the genome sequence data and aid in the effort to gain a better understanding of the Eimeria tenella genome, we constructed a whole genome map for the parasite. RESULTS: A total of 1245 contigs representing 70.0% of the whole genome assembly sequences (Wellcome Trust Sanger Institute) were selected and subjected to marker selection. Subsequently, 2482 HAPPY markers were developed and typed. Of these, 795 were considered as usable markers, and utilized in the construction of a HAPPY map. Markers developed from chromosomally-assigned genes were then integrated into the HAPPY map and this aided the assignment of a number of linkage groups to their respective chromosomes. BAC-end sequences and contigs from whole genome sequencing were also integrated to improve and validate the HAPPY map. This resulted in an integrated HAPPY map consisting of 60 linkage groups that covers approximately half of the estimated 60 Mb genome. Further analysis suggests that the segmental organization first seen in Chromosome 1 is present throughout the genome, with repeat-poor (P) regions alternating with repeat-rich (R) regions. Evidence of copy-number variation between strains was also uncovered. CONCLUSIONS: This paper describes the application of a whole genome mapping method to improve the assembly of the genome of E. tenella from shotgun data, and to help reveal its overall structure. A preliminary assessment of copy-number variation (extra or missing copies of genomic segments) between strains of E. tenella was also carried out. The emerging picture is of a very unusual genome architecture displaying inter-strain copy-number variation. We suggest that these features may be related to the known ability of this parasite to rapidly develop drug resistance.  相似文献   

14.
Histone gene number in relation to C-value in amphibians.   总被引:2,自引:2,他引:0       下载免费PDF全文
We have compared the number of copies of sequences complementary to a cloned Xenopus histone H4 coding sequence in the genomes of Xenopus, Triturus and Ambystoma, amphibian species with widely different C-values (3, 23 and 38pg DNA/haploid genome respectively). Quantitative autoradiography indicates that H4 sequence constitute a greater proportion of the genome the larger that genome is. Measurement of the absolute copy-number by reassociation kinetic analysis indicated 47 +/- 10, 636 +/- 21 2685 +/- 349 copies per haploid genome each in Xenopus, Triturus and Ambystoma respectively. Whilst this confirms a trend of increasing copy-number with increasing C-value, the two are not directly proportional and some other factors must contribute to determining the number of copies of these genes.  相似文献   

15.

Background

Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

Results

Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

Conclusion

Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.  相似文献   

16.
Submicroscopic chromosomal rearrangements that lead to copy-number changes have been shown to underlie distinctive and recognizable clinical phenotypes. The sensitivity to detect copy-number variation has escalated with the advent of array comparative genomic hybridization (CGH), including BAC and oligonucleotide-based platforms. Coupled with improved assemblies and annotation of genome sequence data, these technologies are facilitating the identification of new syndromes that are associated with submicroscopic genomic changes. Their characterization reveals the role of genome architecture in the aetiology of many clinical disorders. We review a group of genomic disorders that are mediated by segmental duplications, emphasizing the impact that high-throughput detection methods and the availability of the human genome sequence have had on their dissection and diagnosis.  相似文献   

17.
Segmental duplications and copy-number variation in the human genome   总被引:33,自引:0,他引:33       下载免费PDF全文
The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders.  相似文献   

18.
Despite considerable excitement over the potential functional significance of copy-number variants (CNVs), we still lack knowledge of the fine-scale architecture of the large majority of CNV regions in the human genome. In this study, we used a high-resolution array-based comparative genomic hybridization (aCGH) platform that targeted known CNV regions of the human genome at approximately 1 kb resolution to interrogate the genomic DNAs of 30 individuals from four HapMap populations. Our results revealed that 1020 of 1153 CNV loci (88%) were actually smaller in size than what is recorded in the Database of Genomic Variants based on previously published studies. A reduction in size of more than 50% was observed for 876 CNV regions (76%). We conclude that the total genomic content of currently known common human CNVs is likely smaller than previously thought. In addition, approximately 8% of the CNV regions observed in multiple individuals exhibited genomic architectural complexity in the form of smaller CNVs within larger ones and CNVs with interindividual variation in breakpoints. Future association studies that aim to capture the potential influences of CNVs on disease phenotypes will need to consider how to best ascertain this previously uncharacterized complexity.  相似文献   

19.
Rearrangements of the genome can be detected by microarray methods and massively parallel sequencing, which identify copy-number alterations and breakpoint junctions, but these techniques are poorly suited to reconstructing the long-range organization of rearranged chromosomes, for example, to distinguish between translocations and insertions. The single-DNA-molecule technique HAPPY mapping is a method for mapping normal genomes that should be able to analyse genome rearrangements, i.e. deviations from a known genome map, to assemble rearrangements into a long-range map. We applied HAPPY mapping to cancer cell lines to show that it could identify rearrangement of genomic segments, even in the presence of normal copies of the genome. We could distinguish a simple interstitial deletion from a copy-number loss at an inversion junction, and detect a known translocation. We could determine whether junctions detected by sequencing were on the same chromosome, by measuring their linkage to each other, and hence map the rearrangement. Finally, we mapped an uncharacterized reciprocal translocation in the T-47D breast cancer cell line to about 2 kb and hence cloned the translocation junctions. We conclude that HAPPY mapping is a versatile tool for determining the structure of rearrangements in the human genome.  相似文献   

20.
Although there are many methods available for inferring copy-number variants (CNVs) from next-generation sequence data, there remains a need for a system that is computationally efficient but that retains good sensitivity and specificity across all types of CNVs. Here, we introduce a new method, estimation by read depth with single-nucleotide variants (ERDS), and use various approaches to compare its performance to other methods. We found that for common CNVs and high-coverage genomes, ERDS performs as well as the best method currently available (Genome STRiP), whereas for rare CNVs and high-coverage genomes, ERDS performs better than any available method. Importantly, ERDS accommodates both unique and highly amplified regions of the genome and does so without requiring separate alignments for calling CNVs and other variants. These comparisons show that for genomes sequenced at high coverage, ERDS provides a computationally convenient method that calls CNVs as well as or better than any currently available method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号