首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Copy number variants (CNVs) contribute significantly to human genomic variation, with over 5000 loci reported, covering more than 18% of the euchromatic human genome. Little is known, however, about the origin and stability of variants of different size and complexity. We investigated the breakpoints of 20 small, common deletions, representing a subset of those originally identified by array CGH, using Agilent microarrays, in 50 healthy French Caucasian subjects. By sequencing PCR products amplified using primers designed to span the deleted regions, we determined the exact size and genomic position of the deletions in all affected samples. For each deletion studied, all individuals carrying the deletion share identical upstream and downstream breakpoints at the sequence level, suggesting that the deletion event occurred just once and later became common in the population. This is supported by linkage disequilibrium (LD) analysis, which has revealed that most of the deletions studied are in moderate to strong LD with surrounding SNPs, and have conserved long-range haplotypes. Analysis of the sequences flanking the deletion breakpoints revealed an enrichment of microhomology at the breakpoint junctions. More significantly, we found an enrichment of Alu repeat elements, the overwhelming majority of which intersected deletion breakpoints at their poly-A tails. We found no enrichment of LINE elements or segmental duplications, in contrast to other reports. Sequence analysis revealed enrichment of a conserved motif in the sequences surrounding the deletion breakpoints, although whether this motif has any mechanistic role in the formation of some deletions has yet to be determined. Considered together with existing information on more complex inherited variant regions, and reports of de novo variants associated with autism, these data support the presence of different subgroups of CNV in the genome which may have originated through different mechanisms.  相似文献   

2.
Chromosomal inversions can facilitate local adaptation in the presence of gene flow by suppressing recombination between well‐adapted native haplotypes and poorly adapted migrant haplotypes. East African mountain populations of the honeybee Apis mellifera are highly divergent from neighbouring lowland populations at two extended regions in the genome, despite high similarity in the rest of the genome, suggesting that these genomic regions harbour inversions governing local adaptation. Here, we utilize a new highly contiguous assembly of the honeybee genome to characterize these regions. Using whole‐genome sequencing data from 55 highland and lowland bees, we find that the highland haplotypes at both regions are present at high frequencies in three independent highland populations but extremely rare elsewhere. The boundaries of both divergent regions are characterized by regions of high homology with each other positioned in opposite orientations and contain highly repetitive, long inverted repeats with homology to transposable elements. These regions are likely to represent inversion breakpoints that participate in nonallelic homologous recombination. Using long‐read data, we confirm that the lowland samples are contiguous across breakpoint regions. We do not find evidence for disruption of functional sequence by these breakpoints, which suggests that the inversions are likely maintained due to their allelic content conferring local adaptation in highland environments. Finally, we identify a third divergent genomic region, which contains highly divergent segregating haplotypes that also may contain inversion variants under selection. The results add to a growing body of evidence indicating the importance of chromosomal inversions in local adaptation.  相似文献   

3.
Gibbons are part of the same superfamily (Hominoidea) as humans and great apes, but their karyotype has diverged faster from the common hominoid ancestor. At least 24 major chromosome rearrangements are required to convert the presumed ancestral karyotype of gibbons into that of the hominoid ancestor. Up to 28 additional rearrangements distinguish the various living species from the common gibbon ancestor. Using the northern white-cheeked gibbon (2n = 52) (Nomascus leucogenys leucogenys) as a model, we created a high-resolution map of the homologous regions between the gibbon and human. The positions of 100 synteny breakpoints relative to the assembled human genome were determined at a resolution of about 200 kb. Interestingly, 46% of the gibbon–human synteny breakpoints occur in regions that correspond to segmental duplications in the human lineage, indicating a common source of plasticity leading to a different outcome in the two species. Additionally, the full sequences of 11 gibbon BACs spanning evolutionary breakpoints reveal either segmental duplications or interspersed repeats at the exact breakpoint locations. No specific sequence element appears to be common among independent rearrangements. We speculate that the extraordinarily high level of rearrangements seen in gibbons may be due to factors that increase the incidence of chromosome breakage or fixation of the derivative chromosomes in a homozygous state.  相似文献   

4.
Segmental duplications and copy-number variation in the human genome   总被引:33,自引:0,他引:33       下载免费PDF全文
The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders.  相似文献   

5.
Strong evidence exists for polyploidy having occurred during the evolution of the tribe Brassiceae. We show evidence for the dynamic and ongoing diploidization process by comparative analysis of the sequences of four paralogous Brassica rapa BAC clones and the homologous 124-kb segment of Arabidopsis thaliana chromosome 5. We estimated the times since divergence of the paralogous and homologous lineages. The three paralogous subgenomes of B. rapa triplicated 13 to 17 million years ago (MYA), very soon after the Arabidopsis and Brassica divergence occurred at 17 to 18 MYA. In addition, a pair of BACs represents a more recent segmental duplication, which occurred approximately 0.8 MYA, and provides an exception to the general expectation of three paralogous segments within the B. rapa genome. The Brassica genome segments show extensive interspersed gene loss relative to the inferred structure of the ancestral genome, whereas the Arabidopsis genome segment appears little changed. Representatives of all 32 genes in the Arabidopsis genome segment are represented in Brassica, but the hexaploid complement of 96 has been reduced to 54 in the three subgenomes, with compression of the genomic region lengths they occupy to between 52 and 110 kb. The gene content of the recently duplicated B. rapa genome segments is identical, but intergenic sequences differ.  相似文献   

6.
We describe genomic structures of 59 X-chromosome segmental duplications that include the proteolipid protein 1 gene (PLP1) in patients with Pelizaeus-Merzbacher disease. We provide the first report of 13 junction sequences, which gives insight into underlying mechanisms. Although proximal breakpoints were highly variable, distal breakpoints tended to cluster around low-copy repeats (LCRs) (50% of distal breakpoints), and each duplication event appeared to be unique (100 kb to 4.6 Mb in size). Sequence analysis of the junctions revealed no large homologous regions between proximal and distal breakpoints. Most junctions had microhomology of 1-6 bases, and one had a 2-base insertion. Boundaries between single-copy and duplicated DNA were identical to the reference genomic sequence in all patients investigated. Taken together, these data suggest that the tandem duplications are formed by a coupled homologous and nonhomologous recombination mechanism. We suggest repair of a double-stranded break (DSB) by one-sided homologous strand invasion of a sister chromatid, followed by DNA synthesis and nonhomologous end joining with the other end of the break. This is in contrast to other genomic disorders that have recurrent rearrangements formed by nonallelic homologous recombination between LCRs. Interspersed repetitive elements (Alu elements, long interspersed nuclear elements, and long terminal repeats) were found at 18 of the 26 breakpoint sequences studied. No specific motif that may predispose to DSBs was revealed, but single or alternating tracts of purines and pyrimidines that may cause secondary structures were common. Analysis of the 2-Mb region susceptible to duplications identified proximal-specific repeats and distal LCRs in addition to the previously reported ones, suggesting that the unique genomic architecture may have a role in nonrecurrent rearrangements by promoting instability.  相似文献   

7.
Inversion polymorphisms have been linked to a variety of fundamental biological and evolutionary processes. Yet few studies have used large-scale genomic sequencing to directly compare the haplotypes associated with the standard and inverted chromosome arrangements. Here we describe the targeted genomic sequencing and comparison of haplotypes representing alternative arrangements of a common inversion polymorphism linked to a suite of phenotypes in the white-throated sparrow (Zonotrichia albicollis). More than 7.4 Mb of genomic sequence was generated and assembled from both the standard (ZAL2) and inverted (ZAL2(m)) arrangements. Sequencing of a pair of inversion breakpoints led to the identification of a ZAL2-specific segmental duplication, as well as evidence of breakpoint reusage. Comparison of the haplotype-based sequence assemblies revealed low genetic differentiation outside versus inside the inversion indicative of historical patterns of gene flow and suppressed recombination between ZAL2 and ZAL2(m). Finally, despite ZAL2(m) being maintained in a near constant state of heterozygosity, no signatures of genetic degeneration were detected on this chromosome. Overall, these results provide important insights into the genomic attributes of an inversion polymorphism linked to mate choice and variation in social behavior.  相似文献   

8.
We have cloned and sequenced a meiotic recombinational hotspot between the A beta 3 and A beta 2 genes in the major histocompatibility complex (MHC) of the mouse. This recombinational hotspot in the Mus musculus castaneus cas3 haplotype was previously localized to a region of 9.5 kb of DNA in which five independent crossing-over events occurred at the unusually high frequency of 0.6%. Aside from cas3, the hotspot appears to be absent in many other MHC haplotypes. We have now confined the five recombinational breakpoints to a stretch of 3.5 kb of DNA. From the nucleotide sequence around the recombinational breakpoints, determined in the parental cas3 and b haplotypes as well as for two recombinant haplotypes, we show that the two recombinant haplotypes were generated by homologous equal crossing-over and place the breakpoints within two non-overlapping stretches of 10 and 36 bp, respectively. Comparison of the DNA sequences of the hotspot-positive cas3 and the hotspot-negative b haplotypes reveals a number of differences, in particular, a CAGA-repeat sequence which is present in CAS3 in six, but only four copies in C57BL/6 DNA. This repeat sequence is reminiscent of one in a previously characterized hotspot in the E beta gene.  相似文献   

9.
The feasibility to sequence entire genomes of virtually any organism provides unprecedented insights into the evolutionary history of populations and species. Nevertheless, many population genomic inferences – including the quantification and dating of admixture, introgression and demographic events, and inference of selective sweeps – are still limited by the lack of high‐quality haplotype information. The newest generation of sequencing technology now promises significant progress. To establish the feasibility of haplotype‐resolved genome resequencing at population scale, we investigated properties of linked‐read sequencing data of songbirds of the genus Oenanthe across a range of sequencing depths. Our results based on the comparison of downsampled (25×, 20×, 15×, 10×, 7×, and 5×) with high‐coverage data (46–68×) of seven bird genomes mapped to a reference suggest that phasing contiguities and accuracies adequate for most population genomic analyses can be reached already with moderate sequencing effort. At 15× coverage, phased haplotypes span about 90% of the genome assembly, with 50% and 90% of phased sequences located in phase blocks longer than 1.25–4.6 Mb (N50) and 0.27–0.72 Mb (N90). Phasing accuracy reaches beyond 99% starting from 15× coverage. Higher coverages yielded higher contiguities (up to about 7 Mb/1 Mb [N50/N90] at 25× coverage), but only marginally improved phasing accuracy. Phase block contiguity improved with input DNA molecule length; thus, higher‐quality DNA may help keeping sequencing costs at bay. In conclusion, even for organisms with gigabase‐sized genomes like birds, linked‐read sequencing at moderate depth opens an affordable avenue towards haplotype‐resolved genome resequencing at population scale.  相似文献   

10.
The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ~250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes.  相似文献   

11.
Using an optimized transformation protocol we have studied the possible interactions between transforming plasmid DNA and the Hansenula polymorpha genome. Plasmids consisting only of a pBR322 replicon, an antibiotic resistance marker for Escherichia coli and the Saccharomyces cerevisiae LEU2 gene were shown to replicate autonomously in the yeast at an approximate copy number of 6 (copies per genome equivalent). This autonomous behaviour is probably due to an H. polymorpha replicon-like sequence present on the S. cerevisiae LEU2 gene fragment. Plasmids replicated as multimers consisting of monomers connected in a head-to-tail configuration. Two out of nine transformants analysed appeared to contain plasmid multimers in which one of the monomers contained a deletion. Plasmids containing internal or flanking regions of the genomic alcohol oxidase gene were shown to integrate by homologous single or double cross-over recombination. Both single- and multi-copy (two or three) tandem integrations were observed. Targeted integration occurred in 1-22% of the cases and was only observed with plasmids linearized within the genomic sequences, indicating that homologous linear ends are recombinogenic in H. polymorpha. In the cases in which no targeted integration occurred, double-strand breaks were efficiently repaired in a homology-independent way. Repair of double-strand breaks was precise in 50-68% of the cases. Linearization within homologous as well as nonhomologous plasmid regions stimulated transformation frequencies up to 15-fold.  相似文献   

12.
Current cytogenetic methods (e.g., G-banding and multicolor chromosomal painting) allow detection of translocation events but lack the resolution to (a) locate the breakpoints precisely at the chromosome band level or (b) discriminate balanced translocations from translocations with copy number alterations not previously reported, or imperfectly balanced translocations. In this study, we demonstrate that cytogenetically balanced translocations are in fact frequently associated with segmental gain or loss of DNA. The recent development of a whole genome tiling path BAC array has enabled tiling resolution analysis of genomic segmental copy number status. Combining tiling resolution BAC array comparative genomic hybridization (array CGH) with G-Banding analysis and multicolor chromosomal painting approaches such as spectral karyotyping (SKY) facilitates high-resolution mapping of genomic alterations associated with imperfectly balanced translocations. Using a refined version of our CGH array we have deduced the copy number status throughout the genomes of three cytogenetically well-characterized prostate cancer cell lines (PC3, DU145, LNCaP) to determine whether translocations are associated with focal gains and losses of DNA. At 78 kb tiling resolution we identified the boundaries of 170, 80, and 34 known and novel copy number alterations (CNA) in these cell line genomes, respectively. Thirty-three of the 36 known translocations (92%, P < 0.001) in DU145 were associated with segmental CNA. Likewise, 80% (P < 0.001) of the known translocations showed association in LNCaP. Although many translocation breakpoints exhibit segmental alteration in PC3, the pattern of chromosomal rearrangements is too complex for use in comprehensive association with CNA boundaries. Our results reveal that imperfectly balanced translocations in tumor genomes are a phenomenon that occurs at frequencies much higher than previously demonstrated. Electronic supplementary material Supplementary material is available in the online version of this article at and is accessible for authorized users.  相似文献   

13.
Velo-cardio-facial syndrome (VCFS) is the most common microdeletion syndrome in humans. It occurs with an estimated frequency of 1 in 4, 000 live births. Most cases occur sporadically, indicating that the deletion is recurrent in the population. More than 90% of patients with VCFS and a 22q11 deletion have a similar 3-Mb hemizygous deletion, suggesting that sequences at the breakpoints confer susceptibility to rearrangements. To define the region containing the chromosome breakpoints, we constructed an 8-kb-resolution physical map. We identified a low-copy repeat in the vicinity of both breakpoints. A set of genetic markers were integrated into the physical map to determine whether the deletions occur within the repeat. Haplotype analysis with genetic markers that flank the repeats showed that most patients with VCFS had deletion breakpoints in the repeat. Within the repeat is a 200-kb duplication of sequences, including a tandem repeat of genes/pseudogenes, surrounding the breakpoints. The genes in the repeat are GGT, BCRL, V7-rel, POM121-like, and GGT-rel. Physical mapping and genomic fingerprint analysis showed that the repeats are virtually identical in the 200-kb region, suggesting that the deletion is mediated by homologous recombination. Examination of two three-generation families showed that meiotic intrachromosomal recombination mediated the deletion.  相似文献   

14.
Artificial selection (domestication and breeding) leaves a strong footprint in plant genomes. Second generation high throughput DNA sequencing technologies make it possible to sequence the gene complement of a plant genome within 3 to 5 months, and the costs of doing so are declining very quickly. This makes it practical to identify genomic regions that have undergone very strong selection. Available reference sequences of important crops such as rice, maize, and sorghum will promote the wide use of re-sequencing strategies in these crops. Marker/trait associations, especially haplotype (or haplotype block) association analyses, will help the precise mapping of important genomic regions and location of favored alleles or haplotypes for breeding. This mini-review examines a genomics approach to defining yield traits in wheat.  相似文献   

15.
The genomes of nonhuman primates are powerful references for better understanding the recent evolution of the human genome. Here we compare the order of 802 genomic markers mapped in a rhesus macaque (Macaca mulatta) radiation hybrid panel with the human genome, allowing for nearly complete cross-reference to the human genome at an average resolution of 3.5 Mb. At least 23 large-scale chromosomal rearrangements, mostly inversions, are needed to explain the changes in marker order between human and macaque. Analysis of the breakpoints flanking inverted chromosomal segments and estimation of their duplication divergence dates provide additional evidence implicating segmental duplications as a major mechanism of chromosomal rearrangement in recent primate evolution.  相似文献   

16.
17.
18.
19.
20.
Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号