首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Patterns of segmental duplication in the human genome   总被引:12,自引:0,他引:12  
We analyzed the completed human genome for recent segmental duplications (size > or = 1 kb and sequence similarity > or = 90%). We found that approximately 4% of the genome is covered by duplications and that the extent of segmental duplication varies from 1% to 14% among the 24 chromosomes. Intrachromosomal duplication is more frequent than interchromosomal duplication in 15 chromosomes. The duplication frequencies in pericentromeric and subtelomeric regions are greater than the genome average by approximately threefold and fourfold. We examined factors that may affect the frequency of duplication in a region. Within individual chromosomes, the duplication frequency shows little correlation with local gene density, repeat density, recombination rate, and GC content, except chromosomes 7 and Y. For the entire genome, the duplication frequency is correlated with each of the above factors. Based on known genes and Ensembl genes, the proportion of duplications containing complete genes is 3.4% and 10.7%, respectively. The proportion of duplications containing genes is higher in intrachromosomal than in interchromosomal duplications, and duplications containing genes have a higher sequence similarity and tend to be longer than duplications containing no genes. Our simulation suggests that many duplications containing genes have been selectively maintained in the genome.  相似文献   

2.
3.
Interpreting the genomic and phenotypic consequences of copy-number variation (CNV) is essential to understanding the etiology of genetic disorders. Whereas deletion CNVs lead obviously to haploinsufficiency, duplications might cause disease through triplosensitivity, gene disruption, or gene fusion at breakpoints. The mutational spectrum of duplications has been studied at certain loci, and in some cases these copy-number gains are complex chromosome rearrangements involving triplications and/or inversions. However, the organization of clinically relevant duplications throughout the genome has yet to be investigated on a large scale. Here we fine-mapped 184 germline duplications (14.7 kb–25.3 Mb; median 532 kb) ascertained from individuals referred for diagnostic cytogenetics testing. We performed next-generation sequencing (NGS) and whole-genome sequencing (WGS) to sequence 130 breakpoints from 112 subjects with 119 CNVs and found that most (83%) were tandem duplications in direct orientation. The remainder were triplications embedded within duplications (8.4%), adjacent duplications (4.2%), insertional translocations (2.5%), or other complex rearrangements (1.7%). Moreover, we predicted six in-frame fusion genes at sequenced duplication breakpoints; four gene fusions were formed by tandem duplications, one by two interconnected duplications, and one by duplication inserted at another locus. These unique fusion genes could be related to clinical phenotypes and warrant further study. Although most duplications are positioned head-to-tail adjacent to the original locus, those that are inverted, triplicated, or inserted can disrupt or fuse genes in a manner that might not be predicted by conventional copy-number assays. Therefore, interpreting the genetic consequences of duplication CNVs requires breakpoint-level analysis.  相似文献   

4.
Segmental duplications and copy-number variation in the human genome   总被引:33,自引:0,他引:33       下载免费PDF全文
The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders.  相似文献   

5.
Copy-number variations cause genomic disorders. Triplications, unlike deletions and duplications, are poorly understood because of challenges in molecular identification, the choice of a proper model system for study, and awareness of their phenotypic consequences. We investigated the genomic disorder Charcot-Marie-Tooth disease type 1A (CMT1A), a dominant peripheral neuropathy caused by a 1.4 Mb recurrent duplication occurring by nonallelic homologous recombination. We identified CMT1A triplications in families in which the duplication segregates. The triplications arose de novo from maternally transmitted duplications and caused a more severe distal symmetric polyneuropathy phenotype. The recombination that generated the triplication occurred between sister chromatids on the duplication-bearing chromosome and could accompany gene conversions with the homologous chromosome. Diagnostic testing for CMT1A (n = 20,661 individuals) identified 13% (n = 2,752 individuals) with duplication and 0.024% (n = 5 individuals) with segmental tetrasomy, suggesting that triplications emerge from duplications at a rate as high as ∼1:550, which is more frequent than the rate of de novo duplication. We propose that individuals with duplications are predisposed to acquiring triplications and that the population prevalence of triplication is underascertained.  相似文献   

6.
Gene duplication plays important roles in organismal evolution, because duplicate genes provide raw materials for the evolution of mechanisms controlling physiological and/or morphological novelties. Gene duplication can occur via several mechanisms, including segmental duplication, tandem duplication and retroposition. Although segmental and tandem duplications have been found to be important for the expansion of a number of multigene families, the contribution of retroposition is not clear. Here we show that plant SKP1 genes have evolved by multiple duplication events from a single ancestral copy in the most recent common ancestor (MRCA) of eudicots and monocots, resulting in 19 ASK (Arabidopsis SKP1-like) and 28 OSK (Oryza SKP1-like) genes. The estimated birth rates are more than ten times the average rate of gene duplication, and are even higher than that of other rapidly duplicating plant genes, such as type I MADS box genes, R genes, and genes encoding receptor-like kinases. Further analyses suggest that a relatively large proportion of the duplication events may be explained by tandem duplication, but few, if any, are likely to be due to segmental duplication. In addition, by mapping the gain/loss of a specific intron on gene phylogenies, and by searching for the features that characterize retrogenes/retrosequences, we show that retroposition is an important mechanism for expansion of the plant SKP1 gene family. Specifically, we propose that two and three ancient retroposition events occurred in lineages leading to Arabidopsis and rice, respectively, followed by repeated tandem duplications and chromosome rearrangements. Our study represents a thorough investigation showing that retroposition can play an important role in the evolution of a plant gene family whose members do not encode mobile elements.  相似文献   

7.
Insights into the origins of structural variation and the mutational mechanisms underlying genomic disorders would be greatly improved by a genomewide map of hotspots of nonallelic homologous recombination (NAHR). Moreover, our understanding of sequence variation within the duplicated sequences that are substrates for NAHR lags far behind that of sequence variation within the single-copy portion of the genome. Perhaps the best-characterized NAHR hotspot lies within the 24-kb-long Charcot-Marie-Tooth disease type 1A (CMT1A)-repeats (REPs) that sponsor deletions and duplications that cause peripheral neuropathies. We investigated structural and sequence diversity within the CMT1A-REPs, both within and between species. We discovered a high frequency of retroelement insertions, accelerated sequence evolution after duplication, extensive paralogous gene conversion, and a greater than twofold enrichment of SNPs in humans relative to the genome average. We identified an allelic recombination hotspot underlying the known NAHR hotspot, which suggests that the two processes are intimately related. Finally, we used our data to develop a novel method for inferring the location of an NAHR hotspot from sequence variation within segmental duplications and applied it to identify a putative NAHR hotspot within the LCR22 repeats that sponsor velocardiofacial syndrome deletions. We propose that a large-scale project to map sequence variation within segmental duplications would reveal a wealth of novel chromosomal-rearrangement hotspots.  相似文献   

8.
Genome duplications may have played a role in the early stages of vertebrate evolution, near the time of divergence of the lamprey lineage. Additional genome duplication, specifically in ray-finned fish, may have occurred before the divergence of the teleosts. The common carp (Cyprinus carpio) has been considered tetraploid because of its chromosome number (2n = 100) and its high DNA content. We studied variation using 59 microsatellite primer pairs to better understand the ploidy level of the common carp. Based on the number of PCR amplicons per individual, about 60% of these primer pairs are estimated to amplify duplicates. Segregation patterns in families suggested a partially duplicated genome structure and disomic inheritance. This could suggest that the common carp is tetraploid and that polyploidy occurred by hybridization (allotetraploidy). From sequences of microsatellite flanking regions, we estimated the difference per base between pairs of alleles and between pairs of paralogs. The distribution of differences between paralogs had two distinct modes suggesting one whole-genome duplication and a more recent wave of segmental duplications. The genome duplication was estimated to have occurred about 12 MYA, with the segmental duplications occurring between 2.3 and 6.8 MYA. At 12 MYA, this would be one of the most recent genome duplications among vertebrates. Phylogenetic analysis of several cyprinid species suggests an evolutionary model for this tetraploidization, with a role for polyploidization in speciation and diversification.  相似文献   

9.
Several years ago, we initiated a long-term project of cloning new human ATP-binding cassette (ABC) transporters and linking them to various disease phenotypes. As one of the results of this project, we present two new members of the human ABCC subfamily, ABCC11 and ABCC12. These two new human ABC transporters were fully characterized and mapped to the human chromosome 16q12. With the addition of these two genes, the complete human ABCC subfamily has 12 identified members (ABCC1-12), nine from the multidrug resistance-like subgroup, two from the sulfonylurea receptor subgroup, and the CFTR gene. Phylogenetic analysis determined that ABCC11 and ABCC12 are derived by duplication, and are most closely related to the ABCC5 gene. Genetic variation in some ABCC subfamily members is associated with human inherited diseases, including cystic fibrosis (CFTR/ABCC7), Dubin-Johnson syndrome (ABCC2), pseudoxanthoma elasticum (ABCC6) and familial persistent hyperinsulinemic hypoglycemia of infancy (ABCC8). Since ABCC11 and ABCC12 were mapped to a region harboring gene(s) for paroxysmal kinesigenic choreoathetosis, the two genes represent positional candidates for this disorder.  相似文献   

10.

Background

Duplications of stretches of the genome are an important source of individual genetic variation, but their unrecognized presence in laboratory organisms would be a confounding variable for genetic analysis.

Results

We report here that duplications of 15 kb or more are common in the genome of the social amoeba Dictyostelium discoideum. Most stocks of the axenic 'workhorse' strains Ax2 and Ax3/4 obtained from different laboratories can be expected to carry different duplications. The auxotrophic strains DH1 and JH10 also bear previously unreported duplications. Strain Ax3/4 is known to carry a large duplication on chromosome 2 and this structure shows evidence of continuing instability; we find a further variable duplication on chromosome 5. These duplications are lacking in Ax2, which has instead a small duplication on chromosome 1. Stocks of the type isolate NC4 are similarly variable, though we have identified some approximating the assumed ancestral genotype. More recent wild-type isolates are almost without large duplications, but we can identify small deletions or regions of high divergence, possibly reflecting responses to local selective pressures. Duplications are scattered through most of the genome, and can be stable enough to reconstruct genealogies spanning decades of the history of the NC4 lineage. The expression level of many duplicated genes is increased with dosage, but for others it appears that some form of dosage compensation occurs.

Conclusion

The genetic variation described here must underlie some of the phenotypic variation observed between strains from different laboratories. We suggest courses of action to alleviate the problem.  相似文献   

11.
Recent segmental and gene duplications in the mouse genome   总被引:2,自引:0,他引:2       下载免费PDF全文

Background

The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies.

Results

We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice.

Conclusion

Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.
  相似文献   

12.
Previous studies of repeat induced point mutation (RIP) have typically involved gene-size duplications resulting from insertion of transforming DNA at ectopic chromosomal positions. To ascertain whether genes in larger duplications are subject to RIP, progeny were examined from crosses heterozygous for long segmental duplications obtained using insertional or quasiterminal translocations. Of 17 distinct mutations from crossing 11 different duplications, 13 mapped within the segment that was duplicated in the parent, one was closely linked, and three were unlinked. Half of the mutations in duplicated segments were at previously unknown loci. The mutations were recessive and were expressed both in haploid and in duplication progeny from Duplication X Normal, suggesting that both copies of the wild-type gene had undergone RIP. Seven transition mutations characteristic of RIP were found in 395 base pairs (bp) examined in one ro-11 allele from these crosses and three were found in ~750 bp of another. A single chain-terminating C to T mutation was found in 800 bp of arg-6. RIP is thus responsible. These results are consistent with the idea that the impaired fertility that is characteristic of segmental duplications is due to inactivation by RIP of genes needed for progression through the sexual cycle.  相似文献   

13.
Genome-level evolution of resistance genes in Arabidopsis thaliana   总被引:2,自引:0,他引:2  
Baumgarten A  Cannon S  Spangler R  May G 《Genetics》2003,165(1):309-319
Pathogen resistance genes represent some of the most abundant and diverse gene families found within plant genomes. However, evolutionary mechanisms generating resistance gene diversity at the genome level are not well understood. We used the complete Arabidopsis thaliana genome sequence to show that most duplication of individual NBS-LRR sequences occurs at close physical proximity to the parent sequence and generates clusters of closely related NBS-LRR sequences. Deploying the statistical strength of phylogeographic approaches and using chromosomal location as a proxy for spatial location, we show that apparent duplication of NBS-LRR genes to ectopic chromosomal locations is largely the consequence of segmental chromosome duplication and rearrangement, rather than the independent duplication of individual sequences. Although accounting for a smaller fraction of NBS-LRR gene duplications, segmental chromosome duplication and rearrangement events have a large impact on the evolution of this multigene family. Intergenic exchange is dramatically lower between NBS-LRR sequences located in different chromosome regions as compared to exchange between sequences within the same chromosome region. Consequently, once translocated to new chromosome locations, NBS-LRR gene copies have a greater likelihood of escaping intergenic exchange and adopting new functions than do gene copies located within the same chromosomal region. We propose an evolutionary model that relates processes of genome evolution to mechanisms of evolution for the large, diverse, NBS-LRR gene family.  相似文献   

14.
PKD1, the locus most commonly affected by mutations that produce autosomal dominant polycystic kidney disease (ADPKD), has previously been localized to chromosome 16p13.3. Since no cytogenetic abnormalities have been found in association with ADPKD, flanking genetic markers have been required to define an interval--the PKD1 region--that contains the PKD1 gene. In this report we demonstrate, through the construction of a long-range restriction map that links the flanking genetic markers GGG1 (D16S84) and 26.6PROX (D16S125), that the PKD1 gene lies within an extremely CpG-rich 750-kb segment of chromosome 16p13.3. Approximately 90% of this region has been cloned in three extensive cosmid/bacteriophage contigs. The cloned DNA is a valuable resource for identifying new closer flanking genetic markers and for isolating candidate genes from the region.  相似文献   

15.
An unexpected finding of the human genome was the large fraction of the genome organized as blocks of interspersed duplicated sequence. We provide a comparative and phylogenetic analysis of a highly duplicated region of 16p12.2, which is composed of at least four different segmental duplications spanning in excess of 160 kb. We contrast the dispersal of two different segmental duplications (LCR16a and LCR16u). LCR16a, a 20 kb low-copy repeat sequence A from chromosome 16, was shown previously to contain a rapidly evolving novel hominoid gene family (morpheus) that had expanded within the last 10 million years of great ape/human evolution. We compare the dispersal of this genomic segment with a second adjacent duplication called LCR16u. The duplication contains a second putative gene family (KIAA0220/SMG1) that is represented approximately eight times within the human genome. A high degree of sequence identity (approximately 98%) was observed among the various copies of LCR16u. Comparative analyses with Old World monkey species show that LCR16a and LCR16u originated from two distinct ancestral loci. Within the human genome, at least 70% of the LCR16u copies were duplicated in concert with the LCR16a duplication. In contrast, only 30% of the chimpanzee loci show an association between LCR16a and LCR16u duplications. The data suggest that the two copies of genomic sequence were brought together during the chimpanzee/human divergence and were subsequently duplicated as a larger cassette specifically within the human lineage. The evolutionary history of these two chromosome-specific duplications supports a model of rapid expansion and evolutionary turnover among the genomes of man and the great apes.  相似文献   

16.
Koszul R  Dujon B  Fischer G 《Genetics》2006,172(4):2211-2222
The high level of gene redundancy that characterizes eukaryotic genomes results in part from segmental duplications. Spontaneous duplications of large chromosomal segments have been experimentally demonstrated in yeast. However, the dynamics of inheritance of such structures and their eventual fixation in populations remain largely unsolved. We analyzed the stability of a vast panel of large segmental duplications in Saccharomyces cerevisiae (from 41 kb for the smallest to 268 kb for the largest). We monitored the stability of three different types of interchromosomal duplications as well as that of three intrachromosomal direct tandem duplications. In the absence of any selective advantage associated with the presence of the duplication, we show that a duplicated segment internally translocated within a natural chromosome is stably inherited both mitotically and meiotically. By contrast, large duplications carried by a supernumerary chromosome are highly unstable. Duplications translocated into subtelomeric regions are lost at variable rates depending on the location of the insertion sites. Direct tandem duplications are lost by unequal crossing over, both mitotically and meiotically, at a frequency proportional to their sizes. These results show that most of the duplicated structures present an intrinsic level of instability. However, translocation within another chromosome significantly stabilizes a duplicated segment, increasing its chance to get fixed in a population even in the absence of any immediate selective advantage conferred by the duplicated genes.  相似文献   

17.
Autosomal dominant polycystic kidney disease (ADPKD) is one of the most commonly inherited renal diseases. ADPKD is a genetically heterogeneous disorder involving at least three different genes. PKD1, the major locus mapped to chromosome 16p13.3 accounts for approximately 85% of ADPKD cases. The search for mutations is a very important step in understanding the molecular mechanisms underlying ADPKD. Despite intense screening by many groups, only a small number of mutations have been described so far. We undertook the first study using denaturing gradient gel electrophoresis (DGGE) to scan for mutations in the non-duplicated region of the PKD1 gene in a large cohort of 146 French unrelated ADPKD patients. We successfully identified novel mutations: 3 are frameshift mutations, 2 nonsense mutations, 2 missense mutations, 1 is an insertion in the frame of 9 nucleotides, 3 intronic variations and several polymorphisms. One of these mutations is the fourth de novo mutation described in this gene. We also describe a family with possible clinical anticipation. DGGE is an effective method for detecting nucleotide changes in the PKD1 gene.  相似文献   

18.
Detection of tandem duplications and implications for linkage analysis.   总被引:1,自引:1,他引:0  
The first demonstration of an autosomal dominant human disease caused by segmental trisomy came in 1991 for Charcot-Marie-Tooth disease type 1A (CMT1A). For this disorder, the segmental trisomy is due to a large tandem duplication of 1.5 Mb of DNA located on chromosome 17p11.2-p12. The search for the CMT1A disease gene was misdirected and impeded because some chromosome 17 genetic markers that are linked to CMT1A lie within this duplication. To better understand how such a duplication might affect genetic analyses in the context of disease gene mapping, we studied the effects of marker duplication on transmission probabilities of marker alleles, on linkage analysis of an autosomal dominant disease, and on tests of linkage homogeneity. We demonstrate that the undetected presence of a duplication distorts transmission ratios, hampers fine localization of the disease gene, and increases false evidence of linkage heterogeneity. In addition, we devised a likelihood-based method for detecting the presence of a tandemly duplicated marker when one is suspected. We tested our methods through computer simulations and on CMT1A pedigrees genotyped at several chromosome 17 markers. On the simulated data, our method detected 96% of duplicated markers (with a false-positive rate of 5%). On the CMT1A data our method successfully identified two of three loci that are duplicated (with no false positives). This method could be used to identify duplicated markers in other regions of the genome and could be used to delineate the extent of duplications similar to that involved in CMT1A.  相似文献   

19.
Gene duplication has certainly played a major role in structuring vertebrate genomes but the extent and nature of the duplication events involved remains controversial. A recent study identified two major episodes of gene duplication: one episode of putative genome duplication ca. 500 Myr ago and a more recent gene-family expansion attributed to segmental or tandem duplications. We confirm this pattern using methods not reliant on molecular clocks for individual gene families. However, analysis of a simple model of the birth-death process suggests that the apparent recent episode of duplication is an artefact of the birth-death process. We show that a constant-rate birth-death model is appropriate for gene duplication data, allowing us to estimate the rate of gene duplication and loss in the vertebrate genome over the last 200 Myr (0.00115 and 0.00740 Myr(-1) lineage(-1), respectively). Finally, we show that increasing rates of gene loss reduce the impact of a genome-wide duplication event on the distribution of gene duplications through time.  相似文献   

20.
Inverted duplications are a common type of copy number variation (CNV) in germline and somatic genomes. Large duplications that include many genes can lead to both neurodevelopmental phenotypes in children and gene amplifications in tumors. There are several models for inverted duplication formation, most of which include a dicentric chromosome intermediate followed by breakage-fusion-bridge (BFB) cycles, but the mechanisms that give rise to the inverted dicentric chromosome in most inverted duplications remain unknown. Here we have combined high-resolution array CGH, custom sequence capture, next-generation sequencing, and long-range PCR to analyze the breakpoints of 50 nonrecurrent inverted duplications in patients with intellectual disability, autism, and congenital anomalies. For half of the rearrangements in our study, we sequenced at least one breakpoint junction. Sequence analysis of breakpoint junctions reveals a normal-copy disomic spacer between inverted and non-inverted copies of the duplication. Further, short inverted sequences are present at the boundary of the disomic spacer and the inverted duplication. These data support a mechanism of inverted duplication formation whereby a chromosome with a double-strand break intrastrand pairs with itself to form a “fold-back” intermediate that, after DNA replication, produces a dicentric inverted chromosome with a disomic spacer corresponding to the site of the fold-back loop. This process can lead to inverted duplications adjacent to terminal deletions, inverted duplications juxtaposed to translocations, and inverted duplication ring chromosomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号