首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Characterization of pre-insertion loci of de novo L1 insertions   总被引:1,自引:0,他引:1  
The human Long Interspersed Element-1 (LINE-1) and the Short Interspersed Element (SINE) Alu comprise 28% of the human genome. They share the same L1-encoded endonuclease for insertion, which recognizes an A+T-rich sequence. Under a simple model of insertion distribution, this nucleotide preference would lead to the prediction that the populations of both elements would be biased towards A+T-rich regions. Genomic L1 elements do show an A+T-rich bias. In contrast, Alu is biased towards G+C-rich regions when compared to the genome average. Several analyses have demonstrated that relatively recent insertions of both elements show less G+C content bias relative to older elements. We have analyzed the repetitive element and G+C composition of more than 100 pre-insertion loci derived from de novo L1 insertions in cultured human cancer cells, which should represent an evolutionarily unbiased set of insertions. An A+T-rich bias is observed in the 50 bp flanking the endonuclease target site, consistent with the known target site for the L1 endonuclease. The L1, Alu, and G+C content of 20 kb of the de novo pre-insertion loci shows a different set of biases than that observed for fixed L1s in the human genome. In contrast to the insertion sites of genomic L1s, the de novo L1 pre-insertion loci are relatively L1-poor, Alu-rich and G+C neutral. Finally, a statistically significant cluster of de novo L1 insertions was localized in the vicinity of the c-myc gene. These results suggest that the initial insertion preference of L1, while A+T-rich in the initial vicinity of the break site, can be influenced by the broader content of the flanking genomic region and have implications for understanding the dynamics of L1 and Alu distributions in the human genome.  相似文献   

2.
Endogenous retroviruses (ERVs), the remnants of retroviral infections in the germ line, occupy ~8% and ~10% of the human and mouse genomes, respectively, and affect their structure, evolution, and function. Yet we still have a limited understanding of how the genomic landscape influences integration and fixation of ERVs. Here we conducted a genome-wide study of the most recently active ERVs in the human and mouse genome. We investigated 826 fixed and 1,065 in vitro HERV-Ks in human, and 1,624 fixed and 242 polymorphic ETns, as well as 3,964 fixed and 1,986 polymorphic IAPs, in mouse. We quantitated >40 human and mouse genomic features (e.g., non-B DNA structure, recombination rates, and histone modifications) in ±32 kb of these ERVs’ integration sites and in control regions, and analyzed them using Functional Data Analysis (FDA) methodology. In one of the first applications of FDA in genomics, we identified genomic scales and locations at which these features display their influence, and how they work in concert, to provide signals essential for integration and fixation of ERVs. The investigation of ERVs of different evolutionary ages (young in vitro and polymorphic ERVs, older fixed ERVs) allowed us to disentangle integration vs. fixation preferences. As a result of these analyses, we built a comprehensive model explaining the uneven distribution of ERVs along the genome. We found that ERVs integrate in late-replicating AT-rich regions with abundant microsatellites, mirror repeats, and repressive histone marks. Regions favoring fixation are depleted of genes and evolutionarily conserved elements, and have low recombination rates, reflecting the effects of purifying selection and ectopic recombination removing ERVs from the genome. In addition to providing these biological insights, our study demonstrates the power of exploiting multiple scales and localization with FDA. These powerful techniques are expected to be applicable to many other genomic investigations.  相似文献   

3.
DNA methylation of CpGs located in two types of repetitive elements—LINE1 (L1) and Alu—is used to assess “global” changes in DNA methylation in studies of human disease and environmental exposure. L1 and Alu contribute close to 30% of all base pairs in the human genome and transposition of repetitive elements is repressed through DNA methylation. Few studies have investigated whether repetitive element DNA methylation is associated with DNA methylation at other genomic regions, or the biological and technical factors that influence potential associations. Here, we assess L1 and Alu DNA methylation by Pyrosequencing of consensus sequences and using subsets of probes included in the Illumina Infinium HumanMethylation27 BeadChip array. We show that evolutionary age and assay method affect the assessment of repetitive element DNA methylation. Additionally, we compare Pyrosequencing results for repetitive elements to average DNA methylation of CpG islands, as assessed by array probes classified into strong, weak and non-islands. We demonstrate that each of these dispersed sequences exhibits different patterns of tissue-specific DNA methylation. Correlation of DNA methylation suggests an association between L1 and weak CpG island DNA methylation in some of the tissues examined. We caution, however, that L1, Alu and CpG island DNA methylation are distinct measures of dispersed DNA methylation and one should not be used in lieu of another. Analysis of DNA methylation data is complex and assays may be influenced by environment and pathology in different or complementary ways.  相似文献   

4.
BACKGROUND/AIMS: The L1 retrotransposable element family is the most successful self-replicating genomic parasite of the human genome. L1 elements drive replication of Alu elements, and both have had far-reaching impacts on the human genome. We use L1 and Alu insertion polymorphisms to analyze human population structure. METHODS: We genotyped 75 recent, polymorphic L1 insertions in 317 individuals from 21 populations in sub-Saharan Africa, East Asia, Europe and the Indian subcontinent. This is the first sample of L1 loci large enough to support detailed population genetic inference. We analyzed these data in parallel with a set of 100 polymorphic Alu insertion loci previously genotyped in the same individuals. RESULTS AND CONCLUSION: The data sets yield congruent results that support the recent African origin model of human ancestry. A genetic clustering algorithm detects clusters of individuals corresponding to continental regions. The number of loci sampled is critical: with fewer than 50 typical loci, structure cannot be reliably discerned in these populations. The inclusion of geographically intermediate populations (from India) reduces the distinctness of clustering. Our results indicate that human genetic variation is neither perfectly correlated with geographic distance (purely clinal) nor independent of distance (purely clustered), but a combination of both: stepped clinal.  相似文献   

5.
LINE1s occupy 17% of the human genome and are its only active autonomous mobile DNA. L1s are also responsible for genomic insertion of processed pseudogenes and >1 million non-autonomous retrotransposons (Alus and SVAs). These elements have significant effects on gene organization and expression. Despite the importance of retrotransposons for genome evolution, much about their biology remains unknown, including cellular factors involved in the complex processes of retrotransposition and forming and transporting L1 ribonucleoprotein particles. By co-immunoprecipitation of tagged L1 constructs and mass spectrometry, we identified proteins associated with the L1 ORF1 protein and its ribonucleoprotein. These include RNA transport proteins, gene expression regulators, post-translational modifiers, helicases and splicing factors. Many cellular proteins co-localize with L1 ORF1 protein in cytoplasmic granules. We also assayed the effects of these proteins on cell culture retrotransposition and found strong inhibiting proteins, including some that control HIV and other retroviruses. These data suggest candidate cofactors that interact with the L1 to modulate its activity and increase our understanding of the means by which the cell coexists with these genomic ‘parasites’.  相似文献   

6.
7.

Background

There are over a half a million copies of L1 retroelements in the human genome which are responsible for as much as 0.5% of new human genetic diseases. Most new L1 inserts arise from young source elements that are polymorphic in the human genome. Highly active polymorphic “hot” L1 source elements have been shown to be capable of extremely high levels of mobilization and result in numerous instances of disease. Additionally, hot polymorphic L1s have been described to be highly active within numerous cancer genomes. These hot L1s result in mutagenesis by insertion of new L1 copies elsewhere in the genome, but also have been shown to generate additional full length L1 insertions which are also hot and able to further retrotranspose. Through this mechanism, hot L1s may amplify within a tumor and result in a continued cycle of mutagenesis.

Results and conclusions

We have developed a method to detect full-length, polymorphic L1 elements using a targeted next generation sequencing approach, Sequencing Identification and Mapping of Primed L1 Elements (SIMPLE). SIMPLE has 94% sensitivity and detects nearly all full-length L1 elements in a genome. SIMPLE will allow researchers to identify hot mutagenic full-length L1s as potential drivers of genome instability. Using SIMPLE we find that the typical individual has approximately 100 non-reference, polymorphic L1 elements in their genome. These elements are at relatively low population frequencies relative to previously identified polymorphic L1 elements and demonstrate the tremendous diversity in potentially active L1 elements in the human population.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1374-y) contains supplementary material, which is available to authorized users.  相似文献   

8.
Konkel MK  Wang J  Liang P  Batzer MA 《Gene》2007,390(1-2):28-38
Mobile elements represent a relatively new class of markers for the study of human evolution. Long interspersed elements (LINEs) belong to a group of retrotransposons comprising approximately 21% of the human genome. Young LINE-1 (L1) elements that have integrated recently into the human genome can be polymorphic for insertion presence/absence in different human populations at particular chromosomal locations. To identify putative novel L1 insertion polymorphisms, we computationally compared two draft assemblies of the whole human genome (Public and Celera Human Genome assemblies). We identified a total of 148 potential polymorphic L1 insertion loci, among which 73 were candidates for novel polymorphic loci. Based on additional analyses we selected 34 loci for further experimental studies. PCR-based assays and DNA sequence analysis were performed for these 34 loci in 80 unrelated individuals from four diverse human populations: African-American, Asian, Caucasian, and South American. All but two of the selected loci were confirmed as polymorphic in our human population panel. Approximately 47% of the analyzed loci integrated into other repetitive elements, most commonly older L1s. One of the insertions was accompanied by a BC200 sequence. Collectively, these mobile elements represent a valuable source of genomic polymorphism for the study of human population genetics. Our results also suggest that the exhaustive identification of L1 insertion polymorphisms is far from complete, and new whole genome sequences are valuable sources for finding novel retrotransposon insertion polymorphisms.  相似文献   

9.
Gasior SL  Preston G  Hedges DJ  Gilbert N  Moran JV  Deininger PL 《Gene》2007,390(1-2):190-198
The human Long Interspersed Element-1 (LINE-1) and the Short Interspersed Element (SINE) Alu comprise 28% of the human genome. They share the same L1-encoded endonuclease for insertion, which recognizes an A+T-rich sequence. Under a simple model of insertion distribution, this nucleotide preference would lead to the prediction that the populations of both elements would be biased towards A+T-rich regions. Genomic L1 elements do show an A+T-rich bias. In contrast, Alu is biased towards G+C-rich regions when compared to the genome average. Several analyses have demonstrated that relatively recent insertions of both elements show less G+C content bias relative to older elements. We have analyzed the repetitive element and G+C composition of more than 100 pre-insertion loci derived from de novo L1 insertions in cultured human cancer cells, which should represent an evolutionarily unbiased set of insertions. An A+T-rich bias is observed in the 50 bp flanking the endonuclease target site, consistent with the known target site for the L1 endonuclease. The L1, Alu, and G+C content of 20 kb of the de novo pre-insertion loci shows a different set of biases than that observed for fixed L1s in the human genome. In contrast to the insertion sites of genomic L1s, the de novo L1 pre-insertion loci are relatively L1-poor, Alu-rich and G+C neutral. Finally, a statistically significant cluster of de novo L1 insertions was localized in the vicinity of the c-myc gene. These results suggest that the initial insertion preference of L1, while A+T-rich in the initial vicinity of the break site, can be influenced by the broader content of the flanking genomic region and have implications for understanding the dynamics of L1 and Alu distributions in the human genome.  相似文献   

10.
Insertion specificity of mobile genetic elements is a rather complex aspect of DNA transposition, which, despite much progress towards its elucidation, still remains incompletely understood. We report here the results of a meta-analysis of IS2 target sites from genomic, phage, and plasmid DNA and find that newly acquired IS2 elements are consistently inserted around abrupt DNA compositional shifts, particularly in the form of switch sites of GC skew. The results presented in this study not only corroborate our previous observations that both the insertion sequence (IS) minicircle junction and target region adopt intrinsically bent conformations in IS2, but most interestingly, extend this requirement to other families of IS elements. Using this information, we were able to pinpoint regions with high propensity for transposition and to predict and detect, de novo, a novel IS2 insertion event in the 3′ region of the gfp gene of a reporter plasmid. We also found that during amplification of this plasmid, process parameters such as scale, culture growth phase, and medium composition exacerbate IS2 transposition, leading to contamination levels with potentially detrimental clinical effects. Overall, our findings provide new insights into the role of target DNA structure in the mechanism of transposition of IS elements and extend our understanding of how culture conditions are a relevant factor in the induction of genetic instability.  相似文献   

11.
L1 elements are mammalian retrotransposons contributing to genome evolution and causing rare mutations in human. We describe a de novo insertion of an L1 element into the dystrophin gene resulting in skipping of exon 44 and causing Duchenne muscular dystrophy in a boy. The L1 element was rearranged due to the twin-priming mechanism, but contrary to all described L1 rearrangements the 5' region of the inverted L1 sequence ended within the poly(A) tail of the element. Furthermore, the target site for the insertion was located only 87 bp from the insertion site in another patient described previously. These findings can contribute to the understanding of the mechanisms of L1 element rearrangement, and may support the notion that some subregions of the human genome could be preferred targets for retroelements using the L1 enzymatic machinery.  相似文献   

12.
The preTa subfamily of long interspersed elements (LINEs) is characterized by a three base-pair "ACG" sequence in the 3' untranslated region, contains approximately 400 members in the human genome, and has low level of nucleotide divergence with an estimated average age of 2.34 million years old suggesting that expansion of the L1 preTa subfamily occurred just after the divergence of humans and African apes. We have identified 362 preTa L1 elements from the draft human genomic sequence, investigated the genomic characteristics of preTa L1 insertions, and screened individual elements across diverse human populations and various non-human primate species using polymerase chain reaction (PCR) assays to determine the phylogenetic origin and levels of human genomic diversity associated with the L1 elements. All of the preTa L1 elements analyzed by PCR were absent from the orthologous positions in non-human primate genomes with 33 (14%) of the L1 elements being polymorphic with respect to insertion presence or absence in the human genome. The newly identified L1 insertion polymorphisms will prove useful as identical by descent genetic markers for the study of human population genetics. We provide evidence that preTa L1 elements show an integration site preference for genomic regions with low GC content. Computational analysis of the preTa L1 elements revealed that 29% of the elements amenable to complete sequence analysis have apparently escaped 5' truncation and are essentially full-length (approximately 6kb). In all, 29 have two intact open reading frames and may be capable of retrotransposition.  相似文献   

13.

Background

The different regions of a genome do not evolve at the same rate. For example, comparative genomic studies have suggested that the sex chromosomes and the regions harbouring the immune defence genes in the Major Histocompatability Complex (MHC) may evolve faster than other genomic regions. The advent of the next generation sequencing technologies has made it possible to study which genomic regions are evolutionary liable to change and which are static, as well as enabling an increasing number of genome studies of non-model species. However, de novo sequencing of the whole genome of an organism remains non-trivial. In this study, we present the draft genome of the black grouse, which was developed using a reference-guided assembly strategy.

Results

We generated 133 Gbp of sequence data from one black grouse individual by the SOLiD platform and used a combination of de novo assembly and chicken reference genome mapping to assemble the reads into 4572 scaffolds with a total length of 1022 Mb. The draft genome well covers the main chicken chromosomes 1 ~ 28 and Z which have a total length of 1001 Mb. The draft genome is fragmented, but has a good coverage of the homologous chicken genes. Especially, 33.0% of the coding regions of the homologous genes have more than 90% proportion of their sequences covered. In addition, we identified ~1 M SNPs from the genome and identified 106 genomic regions which had a high nucleotide divergence between black grouse and chicken or between black grouse and turkey.

Conclusions

Our results support the hypothesis that the chromosome X (Z) evolves faster than the autosomes and our data are consistent with the MHC regions being more liable to change than the genome average. Our study demonstrates how a moderate sequencing effort can be combined with existing genome references to generate a draft genome for a non-model species.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-180) contains supplementary material, which is available to authorized users.  相似文献   

14.
A cis-acting methylation center that signals de novo DNA methylation is located upstream of the mouse Aprt gene. In the current study, two approaches were taken to determine if tandem B1 repetitive elements found at the 3' end of the methylation center contribute to the methylation signal. First, bisulfite genomic sequencing demonstrated that CpG sites within the B1 elements were methylated at relative levels of 43% in embryonal stem cells deficient for the maintenance DNA methyltransferase when compared with wild type embryonal stem cells. Second, the ability of the B1 elements to signal de novo methylation upon stable transfection into mouse embryonal carcinoma cells was examined. This approach demonstrated that the B1 elements were methylated de novo to a high level in the embryonal carcinoma cells and that the B1 elements acted synergistically. The results from these experiments provide strong evidence that the tandem B1 repetitive elements provide a significant fraction of the methylation center signal. By extension, they also support the hypothesis that one role for DNA methylation in mammals is to protect the genome from expression and transposition of parasitic elements.  相似文献   

15.
M. J. Curcio  D. J. Garfinkel 《Genetics》1994,136(4):1245-1259
Despite the abundance of Ty1 RNA in Saccharomyces cerevisiae, Ty1 retrotransposition is a rare event. To determine whether transpositional dormancy is the result of defective Ty1 elements, functional and defective alleles of the retrotransposon in the yeast genome were quantitated. Genomic Ty1 elements were isolated by gap repair-mediated recombination of pGTy1-H3(Δ475-3944)HIS3, a multicopy plasmid containing a GAL1/Ty1-H3 fusion element lacking most of the gag domain (TYA) and the protease (PR) and integrase (IN) domains. Of 39 independent gap repaired pGTyHIS3 elements isolated, 29 (74%) transposed at high levels following galactose induction. The presence of restriction site polymorphisms within the gap repaired region of the 29 functional pGTyHIS3 elements indicated that they were derived from at least eight different genomic Ty1 elements and one Ty2 element. Of the 10 defective pGTyHIS3 elements, one was a partial gap repair event while the other nine were derived from at least six different genomic Ty1 elements. These results suggest that most genomic Ty1 elements encode functional TYA, PR and IN proteins. To understand how functional Ty1 elements are regulated, we tested the hypothesis that a TYB protein associates preferentially in cis with the RNA template that encodes it, thereby promoting transposition of its own element. A genomic Ty1 mhis3AI element containing either an in-frame insertion in PR or a deletion in TYB transposed at the same rate as a wild-type Ty1mhis3AI allele, indicating that TYB proteins act efficiently in trans. This result suggests in principle that defective genomic Ty1 elements could encode trans-acting repressors of transposition; however, expression of only one of the nine defective pGTy1 isolates had a negative effect on genomic Ty1 mhis3AI element transposition in trans, and this effect was modest. Therefore, the few defective Ty1 elements in the genome are not responsible for transpositional dormancy.  相似文献   

16.
While large‐scale genomic approaches are increasingly revealing the genetic basis of polymorphic phenotypes such as colour morphs, such approaches are almost exclusively conducted in species with high‐quality genomes and annotations. Here, we use Pool‐Seq data for both genome assembly and SNP frequency estimation, followed by scanning for FST outliers to identify divergent genomic regions. Using paired‐end, short‐read sequencing data from two groups of individuals expressing divergent phenotypes, we generate a de novo rough‐draft genome, identify SNPs and calculate genomewide FST differences between phenotypic groups. As genomes generated by Pool‐Seq data are highly fragmented, we also present an approach for super‐scaffolding contigs using existing protein‐coding data sets. Using this approach, we reanalysed genomic data from two recent studies of birds and butterflies investigating colour pattern variation and replicated their core findings, demonstrating the accuracy and power of a Pool‐Seq‐only approach. Additionally, we discovered new regions of high divergence and new annotations that together suggest novel parallels between birds and butterflies in the origins of their colour pattern variation.  相似文献   

17.
A principal obstacle to completing maps and analyses of the human genome involves the genome’s “inaccessible” regions: sequences (often euchromatic and containing genes) that are isolated from the rest of the euchromatic genome by heterochromatin and other repeat-rich sequence. We describe a way to localize these sequences by using ancestry linkage disequilibrium in populations that derive ancestry from at least three continents, as is the case for Latinos. We used this approach to map the genomic locations of almost 20 megabases of sequence unlocalized or missing from the current human genome reference (NCBI Genome GRCh37)—a substantial fraction of the human genome’s remaining unmapped sequence. We show that the genomic locations of most sequences that originated from fosmids and larger clones can be admixture mapped in this way, by using publicly available whole-genome sequence data. Genome assembly efforts and future builds of the human genome reference will be strongly informed by this localization of genes and other euchromatic sequences that are embedded within highly repetitive pericentromeric regions.  相似文献   

18.
Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on ∼1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses.  相似文献   

19.
The long interspersed element-1 (LINE-1 or L1) and Alu elements are the most abundant mobile elements comprising 21% and 11% of the human genome, respectively. Since the divergence of human and chimpanzee lineages, these elements have vigorously created chromosomal rearrangements causing genomic difference between humans and chimpanzees by either increasing or decreasing the size of genome. Here, we report an exotic mechanism, retrotransposon recombination-mediated inversion (RRMI), that usually does not alter the amount of genomic material present. Through the comparison of the human and chimpanzee draft genome sequences, we identified 252 inversions whose respective inversion junctions can clearly be characterized. Our results suggest that L1 and Alu elements cause chromosomal inversions by either forming a secondary structure or providing a fragile site for double-strand breaks. The detailed analysis of the inversion breakpoints showed that L1 and Alu elements are responsible for at least 44% of the 252 inversion loci between human and chimpanzee lineages, including 49 RRMI loci. Among them, three RRMI loci inverted exonic regions in known genes, which implicates this mechanism in generating the genomic and phenotypic differences between human and chimpanzee lineages. This study is the first comprehensive analysis of mobile element bases inversion breakpoints between human and chimpanzee lineages, and highlights their role in primate genome evolution.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号