首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely “knock out” the corresponding genes. Across all the 44 genomes, a total of 182 genes were “knocked-out” in at least one individual genome, among which 46 genes were “knocked out” in over 30% of our samples, suggesting that a number of genes are commonly “knocked-out” in general populations. Gene ontology analysis suggested that these commonly “knocked-out” genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.  相似文献   

2.
The inbred mouse is an invaluable model for human biology and disease. Nevertheless, when considering genetic mechanisms of variation and disease, it is important to appreciate the significant differences in the spectra of spontaneous mutations that distinguish these species. While insertions of transposable elements are responsible for only ~0.1% of de novo mutations in humans, the figure is 100-fold higher in the laboratory mouse. This striking difference is largely due to the ongoing activity of mouse endogenous retroviral elements. Here we briefly review mouse endogenous retroviruses (ERVs) and their influence on gene expression, analyze mechanisms of interaction between ERVs and the host cell, and summarize the variety of mutations caused by ERV insertions. The prevalence of mouse ERV activity indicates that the genome of the laboratory mouse is presently behind in the “arms race” against invasion.  相似文献   

3.
Bats are increasingly recognized as reservoir species for a variety of zoonotic viruses that pose severe threats to human health. While many RNA viruses have been identified in bats, little is known about bat retroviruses. Endogenous retroviruses (ERVs) represent genomic fossils of past retroviral infections and, thus, can inform us on the diversity and history of retroviruses that have infected a species lineage. Here, we took advantage of the availability of a high-quality genome assembly for the little brown bat, Myotis lucifugus, to systematically identify and analyze ERVs in this species. We mined an initial set of 362 potentially complete proviruses from the three main classes of ERVs, which were further resolved into 13 major families and 86 subfamilies by phylogenetic analysis. Consensus or representative sequences for each of the 86 subfamilies were then merged to the Repbase collection of known ERV/long terminal repeat (LTR) elements to annotate the retroviral complement of the bat genome. The results show that nearly 5% of the genome assembly is occupied by ERV-derived sequences, a quantity comparable to findings for other eutherian mammals. About one-fourth of these sequences belong to subfamilies newly identified in this study. Using two independent methods, intraelement LTR divergence and analysis of orthologous loci in two other bat species, we found that the vast majority of the potentially complete proviruses identified in M. lucifugus were integrated in the last ∼25 million years. All three major ERV classes include recently integrated proviruses, suggesting that a wide diversity of retroviruses is still circulating in Myotis bats.  相似文献   

4.
Humans share about 99% of their genomic DNA with chimpanzees and bonobos; thus, the differences between these species are unlikely to be in gene content but could be caused by inherited changes in regulatory systems. Endogenous retroviruses (ERVs) comprise approximately 5% of the human genome. The LTRs of ERVs contain many regulatory sequences, such as promoters, enhancers, polyadenylation signals and factor-binding sites. Thus, they can influence the expression of nearby human genes. All known human-specific LTRs belong to the HERV-K (human ERV) family, the most active family in the human genome. It is likely that some of these ERVs could have integrated into regulatory regions of the human genome, and therefore could have had an impact on the expression of adjacent genes, which have consequently contributed to human evolution. This review discusses possible functional consequences of ERV integration in active coding regions.  相似文献   

5.
Endogenous retroviral elements (ERVs) in mice are significant genomic mutagens, causing ~10% of all reported spontaneous germ line mutations in laboratory strains. The majority of these mutations are due to insertions of two high copy ERV families, the IAP and ETn/MusD elements. This significant level of ongoing retrotranspositional activity suggests that inbred mice are highly variable in content of these two ERV groups. However, no comprehensive genome-wide studies have been performed to assess their level of polymorphism. Here we compared three test strains, for which sufficient genomic sequence is available, to each other and to the reference C57BL/6J genome and detected very high levels of insertional polymorphism for both ERV families, with an estimated false discovery rate of only 0.4%. Specifically, we found that at least 60% of IAP and 25% of ETn/MusD elements detected in any strain are absent in one or more of the other three strains. The polymorphic nature of a set of 40 ETn/MusD elements found within gene introns was confirmed using genomic PCR on DNA from a panel of mouse strains. For some cases, we detected gene-splicing abnormalities involving the ERV and obtained additional evidence for decreased gene expression in strains carrying the insertion. In total, we identified nearly 700 polymorphic IAP or ETn/MusD ERVs or solitary LTRs that reside in gene introns, providing potential candidates that may contribute to gene expression differences among strains. These extreme levels of polymorphism suggest that ERV insertions play a significant role in genetic drift of mouse lines.  相似文献   

6.
7.
The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor “silent” germline micronuclear genome by a process of “unscrambling” and fragmentation. The tiny macronuclear “nanochromosomes” typically encode single, protein-coding genes (a small portion, 10%, encode 2–8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease.  相似文献   

8.
Transposable element (TE) amplification has been recognized as a driving force mediating genome size expansion and evolution, but the consequences for shaping 3D genomic architecture remains largely unknown in plants. Here, we report reference-grade genome assemblies for three species of cotton ranging 3-fold in genome size, namely Gossypium rotundifolium (K2), G. arboreum (A2), and G. raimondii (D5), using Oxford Nanopore Technologies. Comparative genome analyses document the details of lineage-specific TE amplification contributing to the large genome size differences (K2, 2.44 Gb; A2, 1.62 Gb; D5, 750.19 Mb) and indicate relatively conserved gene content and synteny relationships among genomes. We found that approximately 17% of syntenic genes exhibit chromatin status change between active (“A”) and inactive (“B”) compartments, and TE amplification was associated with the increase of the proportion of A compartment in gene regions (∼7,000 genes) in K2 and A2 relative to D5. Only 42% of topologically associating domain (TAD) boundaries were conserved among the three genomes. Our data implicate recent amplification of TEs following the formation of lineage-specific TAD boundaries. This study sheds light on the role of transposon-mediated genome expansion in the evolution of higher-order chromatin structure in plants.  相似文献   

9.
10.
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.  相似文献   

11.
Endogenous retroviruses (ERVs) arise from retroviruses chromosomally integrated in the host germline. ERVs are common in vertebrate genomes and provide a valuable fossil record of past retroviral infections to investigate the biology and evolution of retroviruses over a deep time scale, including cross-species transmission events. Here we took advantage of a catalog of ERVs we recently produced for the bat Myotis lucifugus to seek evidence for infiltration of these retroviruses in other mammalian species (>100) currently represented in the genome sequence database. We provide multiple lines of evidence for the cross-ordinal transmission of a gammaretrovirus endogenized independently in the lineages of vespertilionid bats, felid cats and pangolin ~13–25 million years ago. Following its initial introduction, the ERV amplified extensively in parallel in both bat and cat lineages, generating hundreds of species-specific insertions throughout evolution. However, despite being derived from the same viral species, phylogenetic and selection analyses suggest that the ERV experienced different amplification dynamics in the two mammalian lineages. In the cat lineage, the ERV appears to have expanded primarily by retrotransposition of a single proviral progenitor that lost infectious capacity shortly after endogenization. In the bat lineage, the ERV followed a more complex path of germline invasion characterized by both retrotransposition and multiple infection events. The results also suggest that some of the bat ERVs have maintained infectious capacity for extended period of time and may be still infectious today. This study provides one of the most rigorously documented cases of cross-ordinal transmission of a mammalian retrovirus. It also illustrates how the same retrovirus species has transitioned multiple times from an infectious pathogen to a genomic parasite (i.e. retrotransposon), yet experiencing different invasion dynamics in different mammalian hosts.  相似文献   

12.
13.
All vertebrate genomes have been colonized by retroviruses along their evolutionary trajectory. Although endogenous retroviruses (ERVs) can contribute important physiological functions to contemporary hosts, such benefits are attributed to long-term coevolution of ERV and host because germline infections are rare and expansion is slow, and because the host effectively silences them. The genomes of several outbred species including mule deer (Odocoileus hemionus) are currently being colonized by ERVs, which provides an opportunity to study ERV dynamics at a time when few are fixed. We previously established the locus-specific distribution of cervid ERV (CrERV) in populations of mule deer. In this study, we determine the molecular evolutionary processes acting on CrERV at each locus in the context of phylogenetic origin, genome location, and population prevalence. A mule deer genome was de novo assembled from short- and long-insert mate pair reads and CrERV sequence generated at each locus. We report that CrERV composition and diversity have recently measurably increased by horizontal acquisition of a new retrovirus lineage. This new lineage has further expanded CrERV burden and CrERV genomic diversity by activating and recombining with existing CrERV. Resulting interlineage recombinants then endogenize and subsequently expand. CrERV loci are significantly closer to genes than expected if integration were random and gene proximity might explain the recent expansion of one recombinant CrERV lineage. Thus, in mule deer, retroviral colonization is a dynamic period in the molecular evolution of CrERV that also provides a burst of genomic diversity to the host population.  相似文献   

14.
15.
16.
17.
Endogenous retroviruses (ERVs) are vertically transmitted intragenomic elements derived from integrated retroviruses. ERVs can proliferate within the genome of their host until they either acquire inactivating mutations or are lost by recombinational deletion. We present a model that unifies current knowledge of ERV biology into a single evolutionary framework. The model predicts the possible long-term outcomes of retroviral germline infection and can account for the variable patterns of observed ERV genetic diversity. We hope the model will provide a useful framework for understanding ERV evolution, enabling the testing of evolutionary hypotheses and the estimation of parameters governing ERV proliferation.  相似文献   

18.
19.
Endogenous retroviruses (ERVs), the remnants of retroviral infections in the germ line, occupy ~8% and ~10% of the human and mouse genomes, respectively, and affect their structure, evolution, and function. Yet we still have a limited understanding of how the genomic landscape influences integration and fixation of ERVs. Here we conducted a genome-wide study of the most recently active ERVs in the human and mouse genome. We investigated 826 fixed and 1,065 in vitro HERV-Ks in human, and 1,624 fixed and 242 polymorphic ETns, as well as 3,964 fixed and 1,986 polymorphic IAPs, in mouse. We quantitated >40 human and mouse genomic features (e.g., non-B DNA structure, recombination rates, and histone modifications) in ±32 kb of these ERVs’ integration sites and in control regions, and analyzed them using Functional Data Analysis (FDA) methodology. In one of the first applications of FDA in genomics, we identified genomic scales and locations at which these features display their influence, and how they work in concert, to provide signals essential for integration and fixation of ERVs. The investigation of ERVs of different evolutionary ages (young in vitro and polymorphic ERVs, older fixed ERVs) allowed us to disentangle integration vs. fixation preferences. As a result of these analyses, we built a comprehensive model explaining the uneven distribution of ERVs along the genome. We found that ERVs integrate in late-replicating AT-rich regions with abundant microsatellites, mirror repeats, and repressive histone marks. Regions favoring fixation are depleted of genes and evolutionarily conserved elements, and have low recombination rates, reflecting the effects of purifying selection and ectopic recombination removing ERVs from the genome. In addition to providing these biological insights, our study demonstrates the power of exploiting multiple scales and localization with FDA. These powerful techniques are expected to be applicable to many other genomic investigations.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号