首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely “knock out” the corresponding genes. Across all the 44 genomes, a total of 182 genes were “knocked-out” in at least one individual genome, among which 46 genes were “knocked out” in over 30% of our samples, suggesting that a number of genes are commonly “knocked-out” in general populations. Gene ontology analysis suggested that these commonly “knocked-out” genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.  相似文献   

2.
Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.  相似文献   

3.
Salicinoids are well-known defense compounds in salicaceous trees and careful screening at the population level is warranted to fully understand their diversity and function. European aspen, Populus tremula, is a foundation species in Eurasia and highly polymorphic in Sweden. We exhaustively surveyed 102 replicated genotypes from the Swedish Aspen collection (SwAsp) for foliar salicinoids using UHPLC-ESI-TOF/MS and identified nine novel compounds, bringing the total to 19 for this species. Salicinoid structure followed a modular architecture of a salicin skeleton with added side groups, alone or in combination. Two main moieties, 2′-cinnamoyl and 2′-acetyl, grouped the SwAsp population into four distinct chemotypes, and the relative allocation of salicinoids was remarkably constant between different environments, implying a highly channeled biosynthesis of these compounds. Slightly more than half of the SwAsp genotypes belonged to the cinnamoyl chemotype. A fraction synthesized the acetyl moiety alone (∼7%) or in combination with cinnamoyl (∼2%), and close to forty percent lacked either of the two characteristic moieties, and thus resemble P. tremuloides in their salicinoid profile. The two most abundant chemotypes were evenly distributed throughout Sweden, unlike geographical patterns reported for SwAsp phenology traits, plant defense genes, and herbivore community associations. Here we present the salicinoid characterization of the SwAsp collection as a resource for future studies of aspen chemical ecology, salicinoid biosynthesis, and genetics.  相似文献   

4.
Yield Trends Are Insufficient to Double Global Crop Production by 2050   总被引:2,自引:0,他引:2  
Several studies have shown that global crop production needs to double by 2050 to meet the projected demands from rising population, diet shifts, and increasing biofuels consumption. Boosting crop yields to meet these rising demands, rather than clearing more land for agriculture has been highlighted as a preferred solution to meet this goal. However, we first need to understand how crop yields are changing globally, and whether we are on track to double production by 2050. Using ∼2.5 million agricultural statistics, collected for ∼13,500 political units across the world, we track four key global crops—maize, rice, wheat, and soybean—that currently produce nearly two-thirds of global agricultural calories. We find that yields in these top four crops are increasing at 1.6%, 1.0%, 0.9%, and 1.3% per year, non-compounding rates, respectively, which is less than the 2.4% per year rate required to double global production by 2050. At these rates global production in these crops would increase by ∼67%, ∼42%, ∼38%, and ∼55%, respectively, which is far below what is needed to meet projected demands in 2050. We present detailed maps to identify where rates must be increased to boost crop production and meet rising demands.  相似文献   

5.
A better understanding of the impact of global climate change requires information on the locations and characteristics of populations affected. For instance, with global sea level predicted to rise and coastal flooding set to become more frequent and intense, high-resolution spatial population datasets are increasingly being used to estimate the size of vulnerable coastal populations. Many previous studies have undertaken this by quantifying the size of populations residing in low elevation coastal zones using one of two global spatial population datasets available – LandScan and the Global Rural Urban Mapping Project (GRUMP). This has been undertaken without consideration of the effects of this choice, which are a function of the quality of input datasets and differences in methods used to construct each spatial population dataset. Here we calculate estimated low elevation coastal zone resident population sizes from LandScan and GRUMP using previously adopted approaches, and quantify the absolute and relative differences achieved through switching datasets. Our findings suggest that the choice of one particular dataset over another can translate to a difference of more than 7.5 million vulnerable people for countries with extensive coastal populations, such as Indonesia and Japan. Our findings also show variations in estimates of proportions of national populations at risk range from <0.1% to 45% differences when switching between datasets, with large differences predominantly for countries where coarse and outdated input data were used in the construction of the spatial population datasets. The results highlight the need for the construction of spatial population datasets built on accurate, contemporary and detailed census data for use in climate change impact studies and the importance of acknowledging uncertainties inherent in existing spatial population datasets when estimating the demographic impacts of climate change.  相似文献   

6.
The results of physical activity (PA) intervention studies suggest that adaptation to mechanical loading at the femoral neck (FN) is weaker in girls than in boys. Less is known about gender differences associated with non-targeted PA levels at the FN or other clinically relevant regions of the proximal femur. Understanding sex-specific relationships between proximal femur sensitivity and mechanical loading during non-targeted PA is critical to planning appropriate public health interventions. We examined sex-specific associations between non-target PA and bone mineral density (BMD) of three sub-regions of the proximal femur in pre- and early-pubertal boys and girls. BMD at the FN, trochanter (TR) and intertrochanter (IT) regions, and lean mass of the whole body were assessed using dual-energy x-ray absorptiometry in 161 girls (age: 9.7±0.3 yrs) and 164 boys (age: 9.7±0.3 yrs). PA was measured using accelerometry. Multiple linear regression analyses (adjusted for body height, total lean mass and pubertal status) revealed that vigorous PA explained 3–5% of the variability in BMD at all three sub-regions in boys. In girls, vigorous PA explained 4% of the variability in IT BMD and 6% in TR BMD. PA did not contribute to the variance in FN BMD in girls. An additional 10 minutes per day of vigorous PA would be expected to result in a ∼1% higher FN, TR, and IT BMD in boys (p<0.05) and a ∼2% higher IT and TR BMD in girls. In conclusion, vigorous PA can be expected to contribute positively to bone health outcomes for boys and girls. However, the association of vigorous PA to sub-regions of the proximal femur varies by sex, such that girlś associations are heterogeneous and the lowest at the FN, but stronger at the TR and the IT, when compared to boys.  相似文献   

7.
8.

Background

Research investments are essential to address the burden of disease, however allocation of limited resources is poorly documented. We systematically reviewed the investments awarded by funding organisations to UK institutions and their global partners for infectious disease research.

Methodology/Principal Findings

Public and philanthropic investments for the period 1997 to 2010 were included. We categorised studies by infectious disease, cross-cutting theme, and by research and development value chain, reflecting the type of science. We identified 6165 funded studies, with a total research investment of UK £2.6 billion. Public organisations provided £1.4 billion (54.0%) of investments compared with £1.1 billion (42.4%) by philanthropic organisations. Global health studies represented an investment of £928 million (35.7%). The Wellcome Trust was the leading investor with £688 million (26.5%), closely followed by the UK Medical Research Council (MRC) with £673 million (25.9%). Funding over time was volatile, ranging from ∼£40 million to ∼£160 million per year for philanthropic organisations and ∼£30 million to ∼£230 million for public funders.

Conclusions/Significance

Infectious disease research funding requires global coordination and strategic long-term vision. Our analysis demonstrates the diversity and inconsistent patterns in investment, with volatility in annual funding amounts and limited investment for product development and clinical trials.  相似文献   

9.
Food waste contributes to excess consumption of freshwater and fossil fuels which, along with methane and CO2 emissions from decomposing food, impacts global climate change. Here, we calculate the energy content of nationwide food waste from the difference between the US food supply and the food consumed by the population. The latter was estimated using a validated mathematical model of metabolism relating body weight to the amount of food eaten. We found that US per capita food waste has progressively increased by ∼50% since 1974 reaching more than 1400 kcal per person per day or 150 trillion kcal per year. Food waste now accounts for more than one quarter of the total freshwater consumption and ∼300 million barrels of oil per year.  相似文献   

10.
Multiple genome-wide and targeted association studies reveal a significant association of variants in the CHRNA5-CHRNA3-CHRNB4 (CHRNA5/A3/B4) gene cluster on chromosome 15 with nicotine dependence. The subjects examined in most of these studies had a European origin. However, considering the distinct linkage disequilibrium patterns in European and other ethnic populations, it would be of tremendous interest to determine whether such associations could be replicated in populations of other ethnicities, such as Asians. In this study, we performed comprehensive association and interaction analyses for 32 single-nucleotide polymorphisms (SNPs) in CHRNA5/A3/B4 with smoking initiation (SI), smoking quantity (SQ), and smoking cessation (SC) in a Korean sample (N = 8,842). We found nominally significant associations of 7 SNPs with at least one smoking-related phenotype in the total sample (SI: P = 0.015∼0.023; SQ: P = 0.008∼0.028; SC: P = 0.018∼0.047) and the male sample (SI: P = 0.001∼0.023; SQ: P = 0.001∼0.046; SC: P = 0.01). A spectrum of haplotypes formed by three consecutive SNPs located between rs16969948 in CHRNA5 and rs6495316 in the intergenic region downstream from the 5′ end of CHRNB4 was associated with these three smoking-related phenotypes in both the total and the male sample. Notably, associations of these variants and haplotypes with SC appear to be much weaker than those with SI and SQ. In addition, we performed an interaction analysis of SNPs within the cluster using the generalized multifactor dimensionality reduction method and found a significant interaction of SNPs rs7163730 in LOC123688, rs6495308 in CHRNA3, and rs7166158, rs8043123, and rs11072793 in the intergenic region downstream from the 5′ end of CHRNB4 to be influencing SI in the male sample. Considering that fewer than 5% of the female participants were smokers, we did not perform any analysis on female subjects specifically. Together, our detected associations of variants in the CHRNA5/A3/B4 cluster with SI, SQ, and SC in the Korean smoker samples provide strong evidence for the contribution of this cluster to the etiology of SI, ND, and SC in this Asian population.  相似文献   

11.
For the robust practice of genomic medicine, sequencing results must be compatible, regardless of the sequencing technologies and algorithms used. Presently, genome sequencing is still an imprecise science and is complicated by differences in the chemistry, coverage, alignment, and variant-calling algorithms. We identified ∼3.33 million single nucleotide variants (SNVs) and ∼3.62 million SNVs in the SJK genome using SOLiD and Illumina data, respectively. Approximately 3 million SNVs were concordant between the two platforms while 68,532 SNVs were discordant; 219,616 SNVs were SOLiD-specific and 516,080 SNVs were Illumina-specific (i.e., platform-specific). Concordant, discordant, and platform-specific SNVs were further analyzed and characterized. Overall, a large portion of heterozygous SNVs that were discordant with genotyping calls of single nucleotide polymorphism chips were highly confident. Approximately 70% of the platform-specific SNVs were located in regions containing repetitive sequences. Such platform-specificity may arise from differences between platforms, with regard to read length (36 bp and 72 bp vs. 50 bp), insert size (∼100–300 bp vs. ∼1–2 kb), sequencing chemistry (sequencing-by-synthesis using single nucleotides vs. ligation-based sequencing using oligomers), and sequencing quality. When data from the two platforms were merged for variant calling, the proportion of callable regions of the reference genome increased to 99.66%, which was 1.43% higher than the average callability of the two platforms, representing ∼40 million bases. In this study, we compared the differences in sequencing results between two sequencing platforms. Approximately 90% of the SNVs were concordant between the two platforms, yet ∼10% of the SNVs were either discordant or platform-specific, indicating that each platform had its own strengths and weaknesses. When data from the two platforms were merged, both the overall callability of the reference genome and the overall accuracy of the SNVs improved, demonstrating the likelihood that a re-sequenced genome can be revised using complementary data.  相似文献   

12.

Background

Evidence is increasingly accumulated about multiple roles for the β2-adrenoceptor gene in asthma. The results were inconsistent partly due to small sample sizes. To assess the association between β2-adrenoceptor gene polymorphisms and asthma risk, a meta-analysis was performed.

Methods

We comprehensively searched the PubMed, EMBASE, BIOSIS Previews databases and extracted data from all eligible articles to estimate the association between β2-adrenoceptor gene polymorphisms and asthma risk. The pooled odds ratio (OR) with 95% confidence intervals (CIs) were calculated.

Results

Thirty-seven studies involving 6648 asthma patients and 15943 controls were included in the meta-analysis. Overall, significant associations were found in allelic genetic model (OR = 1.06, 95% CI = 1.01∼1.12), recessive genetic model (OR = 1.11, 95% CI = 1.02∼1.21) for Arg/Gly16. Stratified by ethnicity and age, significant associations were also found in Asian population in allelic genetic model, recessive genetic model and addictive model. For Gln/Glu27, no significant association was found when we combined all eligible studies. Age stratification showed significant associations in adults in allelic genetic model and recessive genetic model, but no significant association was found among Asians and Caucasians in ethnicity stratification.

Conclusions

This meta-analysis implied that the β2-adrenoceptor Arg/Gly16 polymorphism was likely to contribute to asthma risk in Asian population. Gln/Glu27 polymorphism might be a contributor to asthma susceptibility for adults.  相似文献   

13.
Spatially accurate, contemporary data on human population distributions are vitally important to many applied and theoretical researchers. The Southeast Asia region has undergone rapid urbanization and population growth over the past decade, yet existing spatial population distribution datasets covering the region are based principally on population count data from censuses circa 2000, with often insufficient spatial resolution or input data to map settlements precisely. Here we outline approaches to construct a database of GIS-linked circa 2010 census data and methods used to construct fine-scale (∼100 meters spatial resolution) population distribution datasets for each country in the Southeast Asia region. Landsat-derived settlement maps and land cover information were combined with ancillary datasets on infrastructure to model population distributions for 2010 and 2015. These products were compared with those from two other methods used to construct commonly used global population datasets. Results indicate mapping accuracies are consistently higher when incorporating land cover and settlement information into the AsiaPop modelling process. Using existing data, it is possible to produce detailed, contemporary and easily updatable population distribution datasets for Southeast Asia. The 2010 and 2015 datasets produced are freely available as a product of the AsiaPop Project and can be downloaded from: www.asiapop.org.  相似文献   

14.
Seasonal affective disorder (SAD) famously follows annual cycles, with incidence elevation in the fall and spring. Should some version of cyclic annual pattern be expected from other psychiatric disorders? Would annual cycles be similar for distinct psychiatric conditions? This study probes these questions using 2 very large datasets describing the health histories of 150 million unique U.S. citizens and the entire Swedish population. We performed 2 types of analysis, using “uncorrected” and “corrected” observations. The former analysis focused on counts of daily patient visits associated with each disease. The latter analysis instead looked at the proportion of disease-specific visits within the total volume of visits for a time interval. In the uncorrected analysis, we found that psychiatric disorders’ annual patterns were remarkably similar across the studied diseases in both countries, with the magnitude of annual variation significantly higher in Sweden than in the United States for psychiatric, but not infectious diseases. In the corrected analysis, only 1 group of patients—11 to 20 years old—reproduced all regularities we observed for psychiatric disorders in the uncorrected analysis; the annual healthcare-seeking visit patterns associated with other age-groups changed drastically. Analogous analyses over infectious diseases were less divergent over these 2 types of computation. Comparing these 2 sets of results in the context of published psychiatric disorder seasonality studies, we tend to believe that our uncorrected results are more likely to capture the real trends, while the corrected results perhaps reflect mostly artifacts determined by dominantly fluctuating, health-seeking visits across a given year. However, the divergent results are ultimately inconclusive; thus, we present both sets of results unredacted, and, in the spirit of full disclosure, leave the verdict to the reader.

Should we expect psychiatric disorders to show a cyclic annual pattern? This study reveals that psychiatric diseases’ annual patterns were remarkably similar across the studied diseases in both the US and Sweden, with the magnitude of annual variation significantly higher in Sweden than in the US for psychiatric, but not infectious, diseases.  相似文献   

15.
The Hui people are unique among Chinese ethnic minorities in that they speak the same language as Han Chinese (HAN) but practice Islam. However, as the second-largest minority group in China numbering well over 10 million, the Huis are under-represented in both global and regional genomic studies. Here, we present the first whole-genome sequencing effort of 234 Hui individuals (NXH) aged over 60 who have been living in Ningxia, where the Huis are mostly concentrated. NXH are genetically more similar to East Asian than to any other global populations. In particular, the genetic differentiation between NXH and HAN (FST = 0.0015) is only slightly larger than that between northern and southern HAN (FST = 0.0010), largely attributed to the western ancestry in NXH (∼10%). Highly differentiated functional variants between NXH and HAN were identified in genes associated with skin pigmentation (e.g., SLC24A5), facial morphology (e.g., EDAR), and lipid metabolism (e.g., ABCG8). The Huis are also distinct from other Muslim groups such as the Uyghurs (FST = 0.0187), especially, NXH derived much less western ancestry (∼10%) compared with the Uyghurs (∼50%). Modeling admixture history indicated that NXH experienced an episode of two-wave admixture. An ancient admixture occurred ∼1,025 years ago, reflecting the intensive west–east contacts during the late Tang Dynasty, and the Five Dynasties and Ten Kingdoms period. A recent admixture occurred ∼500 years ago, corresponding to the Ming Dynasty. Notably, we identified considerable sex-biased admixture, that is, excess of western males and eastern females contributing to the NXH gene pool. The origins and the genomic diversity of the Hui people imply the complex history of contacts between western and eastern Eurasians.  相似文献   

16.
To gain an understanding of the genomic structure and evolutionary history of the giant panda major histocompatibility complex (MHC) genes, we determined a 636,503-bp nucleotide sequence spanning the MHC class II region. Analysis revealed that the MHC class II region from this rare species contained 26 loci (17 predicted to be expressed), of which 10 are classical class II genes (1 DRA, 2 DRB, 2 DQA, 3 DQB, 1 DYB, 1 DPA, and 2 DPB) and 4 are non-classical class II genes (1 DOA, 1 DOB, 1 DMA, and 1 DMB). The presence of DYB, a gene specific to ruminants, prompted a comparison of the giant panda class II sequence with those of humans, cats, dogs, cattle, pigs, and mice. The results indicated that birth and death events within the DQ and DRB-DY regions led to major lineage differences, with absence of these regions in the cat and in humans and mice respectively. The phylogenetic trees constructed using all expressed alpha and beta genes from marsupials and placental mammals showed that: (1) because marsupials carry loci corresponding to DR, DP, DO and DM genes, those subregions most likely developed before the divergence of marsupials and placental mammals, approximately 150 million years ago (MYA); (2) conversely, the DQ and DY regions must have evolved later, but before the radiation of placental mammals (100 MYA). As a result, the typical genomic structure of MHC class II genes for the giant panda is similar to that of the other placental mammals and corresponds to BTNL2∼DR1∼DQ∼DR2∼DY∼DO_box∼DP∼COL11A2. Over the past 100 million years, there has been birth and death of mammalian DR, DQ, DY, and DP genes, an evolutionary process that has brought about the current species-specific genomic structure of the MHC class II region. Furthermore, facing certain similar pathogens, mammals have adopted intra-subregion (DR and DQ) and inter-subregion (between DQ and DP) convergent evolutionary strategies for their alpha and beta genes, respectively.  相似文献   

17.

Background

Canine rabies is one of the most important and feared zoonotic diseases in the world. In some regions rabies elimination is being successfully coordinated, whereas in others rabies is endemic and continues to spread to uninfected areas. As epidemics emerge, both accepted and contentious control methods are used, as questions remain over the most effective strategy to eliminate rabies. The Indonesian island of Bali was rabies-free until 2008 when an epidemic in domestic dogs began, resulting in the deaths of over 100 people. Here we analyze data from the epidemic and compare the effectiveness of control methods at eliminating rabies.

Methodology/Principal Findings

Using data from Bali, we estimated the basic reproductive number, R 0, of rabies in dogs, to be ∼1·2, almost identical to that obtained in ten–fold less dense dog populations and suggesting rabies will not be effectively controlled by reducing dog density. We then developed a model to compare options for mass dog vaccination. Comprehensive high coverage was the single most important factor for achieving elimination, with omission of even small areas (<0.5% of the dog population) jeopardizing success. Parameterizing the model with data from the 2010 and 2011 vaccination campaigns, we show that a comprehensive high coverage campaign in 2012 would likely result in elimination, saving ∼550 human lives and ∼$15 million in prophylaxis costs over the next ten years.

Conclusions/Significance

The elimination of rabies from Bali will not be achieved through achievable reductions in dog density. To ensure elimination, concerted high coverage, repeated, mass dog vaccination campaigns are necessary and the cooperation of all regions of the island is critical. Momentum is building towards development of a strategy for the global elimination of canine rabies, and this study offers valuable new insights about the dynamics and control of this disease, with immediate practical relevance.  相似文献   

18.
Identification of rare variants by resequencing is important both for detecting novel variations and for screening individuals for known disease alleles. New technologies enable low-cost resequencing of target regions, although it is still prohibitive to test more than a few individuals. We propose a novel pooling design that enables the recovery of novel or known rare alleles and their carriers in groups of individuals. The method is based on a Compressed Sensing (CS) approach, which is general, simple and efficient. CS allows the use of generic algorithmic tools for simultaneous identification of multiple variants and their carriers. We model the experimental procedure and show via computer simulations that it enables the recovery of rare alleles and their carriers in larger groups than were possible before. Our approach can also be combined with barcoding techniques to provide a feasible solution based on current resequencing costs. For example, when targeting a small enough genomic region (∼100 bp) and using only ∼10 sequencing lanes and ∼10 distinct barcodes per lane, one recovers the identity of 4 rare allele carriers out of a population of over 4000 individuals. We demonstrate the performance of our approach over several publicly available experimental data sets.  相似文献   

19.
miRNAs have emerged as important players in the regulation of gene expression and their deregulation is a common feature in a variety of diseases, especially cancer. Currently, many efforts are focused on studying miRNA expression patterns, as well as miRNA target validation. Here, we show that the over expression of miR-23a∼27a∼24-2 cluster in HEK293T cells induces apoptosis by caspase-dependent as well as caspase-independent pathway as proved by the annexin assay, caspase activation, release of cytochrome-c and AIF (apoptosis inducing factor) from mitochondria. Furthermore, the over expressed cluster modulates the expression of a number of genes involved in apoptosis including FADD (Fas Associated protein with Death Domain). Bioinformatically, FADD is predicted to be the target of hsa-miR-27a and interestingly, FADD protein was found to be up regulated consistent with very less expression of hsa-miR-27a in HEK293T cells. This effect was direct, as hsa-miR-27a negatively regulated the expression of FADD 3′UTR based reporter construct. Moreover, we also showed that over expression of miR-23a∼27a∼24-2 sensitized HEK293T cells to TNF-α cytotoxicity. Taken together, our study demonstrates that enhanced TNF-α induced apoptosis in HEK293T cells by over expression of miR-23a∼27a∼24-2 cluster provides new insights in the development of novel therapeutics for cancer.  相似文献   

20.
The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor “silent” germline micronuclear genome by a process of “unscrambling” and fragmentation. The tiny macronuclear “nanochromosomes” typically encode single, protein-coding genes (a small portion, 10%, encode 2–8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号