首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Intrachromosomal and interchromosomal segmental duplications account for more than 5% of the human genome. To analyze the processes resulting in the complex mosaic structure of duplicons, a draft human genome sequence was searched for duplicated segments of a genomic fragment of the pericentric region of the chromosome 21 short arm. The duplicons found consist of modules having paralogs in various genome regions. Module ends are flanked with various tandem or interspersed repeats, which are more unstable as compared with unique sequences. In most cases, the boundaries of duplicated segments exactly coincide with or are in close proximity to hot spots of various rearrangements within repeats or boundaries between repeats and unique sequences or between two different repeats. Homologous recombination between repetitive elements was assumed to be the major mechanism contributing to the mosaic structure of duplicons.  相似文献   

2.
The genomic sequences within the alpha-block (approximately 288-310 kb) of the human and chimpanzee MHC class I region contains ten MHC class I genes and three MIC gene fragments grouped together within alternating duplicated genomic segments or duplicons. In this study, the chimpanzee and human genomic sequences were analyzed in order to determine whether the remnants of the ERVK9 and other retrotransposon sequences are useful genomic markers for reconstructing the evolutionary history of the duplicated MHC gene families within the alpha-block. A variety of genes, pseudogenes, autologous DNA transposons and retrotransposons such as Alu and ERVK9 were used to categorize the ten duplicons into four distinct structural groups. The phylogenetic relationship of the ten duplicons was examined by using the neighbour joining method to analyze transposon sequence topologies of selected Alu members, LTR16B and Charlie9. On the basis of these structural groups and the phylogeny of the duplicated transposon sequences, a duplication model was reconstructed involving four multipartite tandem duplication steps to explain the organization and evolution of the ten duplicons within the alpha-block of the chimpanzee and human. The phylogenetic analysis and inferred duplication history suggests that the Patr/HLA-F was the first MHC class I gene to have been fixed and not required as a precursor for further duplication within the alpha-block of the ancestral species.  相似文献   

3.
The human CD1 proteins belong to a lipid-glycolipid antigen-presenting gene family and are related in structure and function to the MHC class I molecules. Previous mapping and DNA hybridization studies have shown that five linked genes located within a cluster on human chromosome 1q22-23 encode the CD1 protein family. We have analyzed the complete genomic sequence of the human CD1 gene cluster and found that the five active genes are distributed over 175,600 nucleotides and separated by four expanded intervening genomic regions (IGRs) ranging in length between 20 and 68 kb. The IGRs are composed mostly of retroelements including five full-length L1 PA sequences and various pseudogenes. Some L1 sequences have acted as receptors for other subtypes or families of retroelements. Alu molecular clocks that have evolved during primate history are found distributed within the HLA class I duplicated segments (duplicons) but not within the duplicons of CD1. Phylogeny of the alpha3 domain of the class I-like superfamily of proteins shows that the CD1 cluster is well separated from HLA class I by a number of superfamily members including MIC (PERB11), HFE, Zn-alpha2-GP, FcRn, and MR1. Phylogenetically, the human CD1 sequences are interspersed by CD1 sequences from other mammalian species, whereas the human HLA class I sequences cluster together and are separated from the other mammalian sequences. Genomic and phylogenetic analyses support the view that the human CD1 gene copies were duplicated prior to the evolution of primates and the bulk of the HLA class I genes found in humans. In contrast to the HLA class I genomic structure, the human CD1 duplicons are smaller in size, they lack Alu clocks, and they are interrupted by IGRs at least 4 to 14 times longer than the CD1 genes themselves. The IGRs seem to have been created as "buffer zones" to protect the CD1 genes from disruption by transposable elements.  相似文献   

4.
5.
The plastid genome of Trifolium subterraneum is 144,763 bp, about 20 kb longer than those of closely related legumes, which also lost one copy of the large inverted repeat (IR). The genome has undergone extensive genomic reconfiguration, including the loss of six genes (accD, infA, rpl22, rps16, rps18, and ycf1) and two introns (clpP and rps12) and numerous gene order changes, attributable to 14–18 inversions. All endpoints of rearranged gene clusters are flanked by repeated sequences, tRNAs, or pseudogenes. One unusual feature of the Trifolium subterraneum genome is the large number of dispersed repeats, which comprise 19.5% (ca. 28 kb) of the genome (versus about 4% for other angiosperms) and account for part of the increase in genome size. Nine genes (psbT, rbcL, clpP, rps3, rpl23, atpB, psbN, trnI-cau, and ycf3) have also been duplicated either partially or completely. rpl23 is the most highly duplicated gene, with portions of this gene duplicated six times. Comparisons of the Trifolium plastid genome with the Plant Repeat Database and searches for flanking inverted repeats suggest that the high incidence of dispersed repeats and rearrangements is not likely the result of transposition. Trifolium has 19.5 kb of unique DNA distributed among 160 fragments ranging in size from 30 to 494 bp, greatly surpassing the other five sequenced legume plastid genomes in novel DNA content. At least some of this unique DNA may represent horizontal transfer from bacterial genomes. These unusual features provide direction for the development of more complex models of plastid genome evolution. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

6.
Abstract

The human genome is composed of large sequence segments with fairly homogeneous GC content, namely isochores, which have been linked to many important functions; biological implications of most isochore boundaries, however, remain elusive, partly due to the difficulty in determining these boundaries at high resolution. Using the segmentation algorithm based on the quadratic divergence, we re-determined all 79 boundaries of previously identified human isochores at single-nucleotide resolution, and then compared the boundary coordinates with other genome features. We found that 55.7% of isochore boundaries coincide with termini of repeat elements; 45.6% of isochore boundaries coincide with termini of highly conserved sequences based on alignment of 17 vertebrate genomes, i.e., the highly conserved genome sequence switches to a less or non-conserved one at the isochore boundary; some isochore boundaries coincide with abrupt change of CpG island distribution (note that one boundary can associate with more than one genome feature). In addition, sequences around isochore boundaries are highly conserved. It seems reasonable to deduce that the boundaries of all the isochores studied here would be replication timing sites in the human genome. These results suggest possible key roles of the isochore boundaries and may further our understanding of the human genome organization.  相似文献   

7.

Background  

Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences.  相似文献   

8.
The class I region of the major histocompatibility complex contains two subgenomic blocks (250–350 kb each), known as the alpha and beta blocks. These blocks contain members of multicopy gene families including HLA class I, HERV-16 (previously called P5 sequences), and PERB11 (MIC). We have previously shown that each block consists of imperfect duplicated segments (duplicons) containing linked members of different gene families, retroelements and transposons that have coevolved as part of two separate evolutionary events. Another region provisionally designated here as the kappa block is located between the alpha and the beta blocks and contains HLA-E, -30, and -92, HERV-16 (P5.3), and PERB11.3 (MICC) within about 250 kb of sequence. Using Alu elements to trace the evolutionary relationships between different class I duplicons, we have found that (a) the kappa block contains paralogous (duplicated) Alu J sequences and other retroelement patterns more in common with the beta than the alpha block; (b) the retroelement pattern associated with the HLA-E duplicon is different from all other HLA class I duplicons, indicating a more complex evolution; (c) the HLA-92 duplicon, although substantially shorter, is closely related in sequence to the HLA-B and -C duplicons; (d) two of the six paralogous Alu J elements within the HLA-B and -C duplicons are associated with the HLA-X duplicon, confirming their evolutionary relationships within the beta block; and (e) the paralogous Alu J elements within the alpha block are distinctly different from those identified within the beta and kappa blocks. The sequence conservation and location of duplicated (paralogous) Alu J elements in the MHC class I region show that the beta and kappa blocks have evolved separately from the alpha block beginning at a time before or during the evolution of Alu J elements in primates. Received: 22 September 1999 / Accepted: 24 January 2000  相似文献   

9.
高等植物基因组中,大部分序列为非表达序列,基因序列所占的比例很小,了解基因在基因组中的分布是研究基因组结构的一个重要方面。在美国能源部资助下,一个毛果杨无性系的基因组测序已经完成并对公众发布。杨树全基因组序列的完成,为我们了解林木基因组中基因的分布提供了一个特例。在本文中,我们利用泊松分析对杨树基因组中基因在各个染色体上的密度进行了检测,结果表明杨树基因组中各条染色体的基因含量存在显著差异。杨树全基因组测序项目揭示现代杨树基因组起源于一次古全基因组复制事件(称为杨柳科基因组复制),所以杨树基因组不同染色体间存在很大的同源复制片段。但是我们的研究显示,杨树基因组中大多数高度同源的染色体上基因的密度与染色体间的同源性没有明显关系,这说明杨柳科全基因组复制事件后,各个高度同源染色体上的基因发生了流失,且基因流失的速率是不一样的。同时本文还对近九万条毛果杨EST序列进行了比对分析,结果显示这些EST序列覆盖的基因仅占杨树基因组中基因总数的16.8%左右。EST测序虽然是发现基因的一个重要手段,但小规模EST测序对基因的覆盖度很低,所以小规模EST测序的应用价值是有限的。  相似文献   

10.
A 119-kb bacterial artificial chromosome from the JOINTLESS locus on the tomato (Lycopersicon esculentum) chromosome 11 contained 15 putative genes. Repetitive sequences in this region include one copia-like LTR retrotransposon, 13 simple sequence repeats, three copies of a novel type III foldback transposon, and four putative short DNA repeats. Database searches showed that the foldback transposon and the short DNA repeats seemed to be associated preferably with genes. The predicted tomato genes were compared with the complete Arabidopsis genome. Eleven out of 15 tomato open reading frames were found to be colinear with segments on five Arabidopsis bacterial artificial chromosome/P1-derived artificial chromosome clones. The synteny patterns, however, did not reveal duplicated segments in Arabidopsis, where over half of the genome is duplicated. Our analysis indicated that the microsynteny between the tomato and Arabidopsis genomes was still conserved at a very small scale but was complicated by the large number of gene families in the Arabidopsis genome.  相似文献   

11.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

12.
Chloroplast genome organization, gene order, and content are highly conserved among land plants. We sequenced the chloroplast genome of Trachelium caeruleum L. (Campanulaceae), a member of an angiosperm family known for highly rearranged genomes. The total genome size is 162,321 bp, with an inverted repeat (IR) of 27,273 bp, large single-copy (LSC) region of 100,114 bp, and small single-copy (SSC) region of 7,661 bp. The genome encodes 112 different genes, with 17 duplicated in the IR, a tRNA gene (trnI-cau) duplicated once in the LSC region, and a protein-coding gene (psbJ) with two duplicate copies, for a total of 132 putatively intact genes. ndhK may be a pseudogene with internal stop codons, and clpP, ycf1, and ycf2 are so highly diverged that they also may be pseudogenes. ycf15, rpl23, infA, and accD are truncated and likely nonfunctional. The most conspicuous feature of the Trachelium genome is the presence of 18 internally unrearranged blocks of genes inverted or relocated within the genome relative to the ancestral gene order of angiosperm chloroplast genomes. Recombination between repeats or tRNA genes has been suggested as a mechanism of chloroplast genome rearrangements. The Trachelium chloroplast genome shares with Pelargonium and Jasminum both a higher number of repeats and larger repeated sequences in comparison to eight other angiosperm chloroplast genomes, and these are concentrated near rearrangement endpoints. Genes for tRNAs occur at many but not all inversion endpoints, so some combination of repeats and tRNA genes may have mediated these rearrangements.  相似文献   

13.
The sequence determination of several genomic clones isolated from the Mediterranean fruitfly Ceratitis capitata identified the existence of opa-like repeats, often more than one being clustered in small chromosomal segments. These repeats have previously been shown to consist of stretches of tandemly reiterated glutamine-encoding residues, and they are found in multiple genes of several organisms. Most of the repeats described here are flanked or interrupted by stop codons in all reading frames and, thus, could not possibly be part of protein-coding sequences. Furthermore, these repeats, of which there are several hundred in the genome of the Medfly, can be used effectively for the determination of sequence polymorphisms, providing a convenient approach to obtain additional landmarks for the construction of genomic maps of this economically important insect.This paper is dedicated to the memory of our colleague and friend Dr. Jim Flach who took part in the initial phase of this work and died during the course of the investigation.  相似文献   

14.
15.
Abundant repetitive DNA sequences are an enigmatic part of the human genome. Despite increasing evidence on the functionality of DNA repeats, their biologic role is still elusive and under frequent debate. Macrosatellites are the largest of the tandem DNA repeats, located on one or multiple chromosomes. The contribution of macrosatellites to genome regulation and human health was demonstrated for the D4Z4 macrosatellite repeat array on chromosome 4q35. Reduced copy number of D4Z4 repeats is associated with local euchromatinization and the onset of facioscapulohumeral muscular dystrophy. Although the role other macrosatellite families may play remains rather obscure, their diverse functionalities within the genome are being gradually revealed. In this review, we will outline structural and functional features of coding and noncoding macrosatellite repeats, and highlight recent findings that bring these sequences into the spotlight of genome organization and disease development.  相似文献   

16.
Our objective was to test whether or not cyclization recombination (CRE), the P1 phage site-specific recombinase, induces genome rearrangements in plastids. Testing was carried out in tobacco plants in which a DNA sequence, located between two inversely oriented locus of X-over of P1 (loxP) sites, underwent repeated cycles of inversions as a means of monitoring CRE activity. We report here that CRE mediates deletions between loxP sites and plastid DNA sequences in the 3'rps12 gene leader (lox-rps12) or in the psbA promoter core (lox-psbA). We also observed deletions between two directly oriented lox-psbA sites, but not between lox-rps12 sites. Deletion via duplicated rRNA operon promoter (Prrn) sequences was also frequent in CRE-active plants. However, CRE-mediated recombination is probably not directly involved, as no recombination junction between loxP and Prrn could be observed. Tobacco plants carrying deleted genomes as a minor fraction of the plastid genome population were fertile and phenotypically normal, suggesting that the absence of deleted genome segments was compensated by gene expression from wild-type copies. The deleted plastid genomes disappeared in the seed progeny lacking CRE. Observed plastid genome rearrangements are specific to engineered plastid genomes, which contain at least one loxP site or duplicated psbA promoter sequences. The wild-type plastid genome is expected to be stable, even if CRE is present in the plastid.  相似文献   

17.
Hughes AL  Friedman R 《Genetica》2004,121(2):181-185
Statistical analysis of the distribution of transposable elements (TEs) and tRNA genes in the genome of yeast Saccharomyces cerevisiae indicated that, although tRNA genes and other genes transcribed by RNA polymerase III are targets for TE insertion, the distribution of TEs was significantly more clumped than that of tRNAs. Genomic blocks putatively duplicated as the result of an ancient polyploidization event contained fewer TEs than expected by their length, and nearly two thirds of duplicated blocks lacked TEs altogether. In addition, the edges of duplicated blocks tended to be located in TE-poor genomic regions. These results can be explained by the hypotheses: (1) that transposition events have occurred well after block duplication; (2) that TEs have frequently played a role in genomic rearrangement events in yeast. According to this model, duplicated blocks identifiable as such in the present-day yeast genome are found largely in regions with low TE density because in such regions the duplicated structure has not been obscured by TE-mediated rearrangements.  相似文献   

18.
The nontransforming Epstein-Barr virus (EBV) strain P3HR-1 is known to have a deletion of sequences of the long unique region adjacent to the large internal repeats. The deleted region is believed to be required for initiation of transformation. To establish a more detailed map of the deletion in P3HR-1 virus, SalI-A of the transforming strain M-ABA and of P3HR-1 virus was cloned into the cosmid vector pHC79 and multiplied in Escherichia coli. The cleavage sites for BamHI, BglII, EcoRI, PstI, SacI, SacII, and XhoI were determined in the recombinant plasmid clones. Analysis of the boundary between large internal repeats and the long unique region showed that in M-ABA (EBV) the transition is different from that in B95-8 virus. The map established for SalI-A of P3HR-1 virus revealed that, in contrast to previous reports, the deletion has a size of 6.5 kilobase pairs. It involves the junction between large internal repeats and the long unique region and includes more than half of the rightmost large internal repeat. The site of the deletion in the long unique region is located between a SacI and a SacII site, about 200 base pairs apart from each other. The sequences neighboring the deletion in the long unique region showed homology to the nonrepeated sequences of the DS(R) (duplicated sequence, right) region. Sequences of the large internal repeat are thus fused to sequences of the DS(L) (duplicated sequence, left) region in P3HR-1 virus DNA under elimination of the DS(L) repeats. Jijoye, the parental Burkitt lymphoma cell line from which the P3HR-1 line is derived by single-cell cloning, is known to produce a transforming virus. Analysis of the Jijoye (EBV) genome with cloned M-ABA (EBV) probes specific for the sequences missing in P3HR-1 virus revealed that the sequences of M-ABA (EBV) BamHI-H2 are not represented in Jijoye (EBV). In Jijoye (EBV) the complete DS(L) region including the DS(L) repeats is, however, conserved. Further analysis of Jijoye (EBV) and of Jijoye virustransformed cell lines will be helpful to narrow down the region required for transformation.  相似文献   

19.
The human genome is a mosaic of isochores, which are long DNA segments (300 kbp) relatively homogeneous in G+C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores.  相似文献   

20.
The chloroplast genome sequence of Coffea arabica L., the first sequenced member of the fourth largest family of angiosperms, Rubiaceae, is reported. The genome is 155 189 bp in length, including a pair of inverted repeats of 25 943 bp. Of the 130 genes present, 112 are distinct and 18 are duplicated in the inverted repeat. The coding region comprises 79 protein genes, 29 transfer RNA genes, four ribosomal RNA genes and 18 genes containing introns (three with three exons). Repeat analysis revealed five direct and three inverted repeats of 30 bp or longer with a sequence identity of 90% or more. Comparisons of the coffee chloroplast genome with sequenced genomes of the closely related family Solanaceae indicated that coffee has a portion of rps19 duplicated in the inverted repeat and an intact copy of infA . Furthermore, whole-genome comparisons identified large indels (> 500 bp) in several intergenic spacer regions and introns in the Solanaceae, including trnE (UUC)– trnT (GGU) spacer, ycf4 – cemA spacer, trnI (GAU) intron and rrn5 – trnR (ACG) spacer. Phylogenetic analyses based on the DNA sequences of 61 protein-coding genes for 35 taxa, performed using both maximum parsimony and maximum likelihood methods, strongly supported the monophyly of several major clades of angiosperms, including monocots, eudicots, rosids, asterids, eurosids II, and euasterids I and II. Coffea (Rubiaceae, Gentianales) is only the second order sampled from the euasterid I clade. The availability of the complete chloroplast genome of coffee provides regulatory and intergenic spacer sequences for utilization in chloroplast genetic engineering to improve this important crop.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号