期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving Nelumbo nucifera genome assemblies using high‐resolution genetic maps and BioNano genome mapping reveals ancient chromosome rearrangements

Songtao Gui Jing Peng Xiaolei Wang Zhihua Wu Rui Cao Jérôme Salse Hongyuan Zhang Zhixuan Zhu Qiuju Xia Zhiwu Quan Liping Shu Wedong Ke Yi Ding 《The Plant journal : for cell and molecular biology》2018,94(4):721-734

Genetic and physical maps are powerful tools to anchor fragmented draft genome assemblies generated from next‐generation sequencing. Currently, two draft assemblies of Nelumbo nucifera, the genomes of ‘China Antique’ and ‘Chinese Tai‐zi’, have been released. However, there is presently no information on how the sequences are assembled into chromosomes in N. nucifera. The lack of physical maps and inadequate resolution of available genetic maps hindered the assembly of N. nucifera chromosomes. Here, a linkage map of N. nucifera containing 2371 bin markers [217 577 single nucleotide polymorphisms (SNPs)] was constructed using restriction‐site associated DNA sequencing data of 181 F₂ individuals and validated by adding 197 simple sequence repeat (SSR) markers. Additionally, a BioNano optical map covering 86.20% of the ‘Chinese Tai‐zi’ genome was constructed. The draft assembly of ‘Chinese Tai‐zi’ was improved based on the BioNano optical map, showing an increase of the scaffold N50 from 0.989 to 1.48 Mb. Using a combination of multiple maps, 97.9% of the scaffolds in the ‘Chinese Tai‐zi’ draft assembly and 97.6% of the scaffolds in the ‘China Antique’ draft assembly were anchored into pseudo‐chromosomes, and the centromere regions along the pseudo‐chromosomes were identified. An evolutionary scenario was proposed to reach the modern N. nucifera karyotype from the seven ancestral eudicot chromosomes. The present study provides the highest‐resolution linkage map, the optical map and chromosome level genome assemblies for N. nucifera, which are valuable for the breeding and cultivation of N. nucifera and future studies of comparative and evolutionary genomics in angiosperms. 相似文献

2.

Computational comparison of two mouse draft genomes and the human golden path

Xuan Z Wang J Zhang MQ 《Genome biology》2003,4(1):R1

Background

The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. 相似文献

3.

Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies

Daren C. Card Drew R. Schield Jacobo Reyes-Velasco Matthew K. Fujita Audra L. Andrew Sara J. Oyler-McCance Jennifer A. Fike Diana F. Tomback Robert P. Ruggiero Todd A. Castoe 《PloS one》2014,9(9)

As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark''s Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies. 相似文献

4.

Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip®arrays

Chudin Eugene Walker Randal Kosaka Alan Wu Sue X Rabert Douglas Chang Thomas K Kreder Dirk E 《Genome biology》2002,4(1):1-10

Background

The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods.

Results

We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes.

Conclusion

The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics. 相似文献

5.

BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes

下载免费PDF全文

Helena Staňková Alex R. Hastie Saki Chan Jan Vrána Zuzana Tulpová Marie Kubaláková Paul Visendi Satomi Hayashi Mingcheng Luo Jacqueline Batley David Edwards Jaroslav Doležel Hana Šimková 《Plant biotechnology journal》2016,14(7):1523-1531

The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. 相似文献

6.

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird‐of‐paradise

Valentina Peona Mozes P. K. Blom Luohao Xu Reto Burri Shawn Sullivan Ignas Bunikis Ivan Liachko Tri Haryoko Knud A. Jnsson Qi Zhou Martin Irestedt Alexander Suh 《Molecular ecology resources》2021,21(1):263-286

Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes. 相似文献

7.

Multiplex sequencing of bacterial artificial chromosomes for assembling complex plant genomes

下载免费PDF全文

Sebastian Beier Axel Himmelbach Thomas Schmutzer Marius Felder Stefan Taudien Klaus F. X. Mayer Matthias Platzer Nils Stein Uwe Scholz Martin Mascher 《Plant biotechnology journal》2016,14(7):1511-1522

Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path. 相似文献

8.

Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism

下载免费PDF全文

René L. Warren Christopher I. Keeling Macaire Man Saint Yuen Anthony Raymond Greg A. Taylor Benjamin P. Vandervalk Hamid Mohamadi Daniel Paulino Readman Chiu Shaun D. Jackman Gordon Robertson Chen Yang Brian Boyle Margarete Hoffmann Detlef Weigel David R. Nelson Carol Ritland Nathalie Isabel Barry Jaquish Alvin Yanchuk Jean Bousquet Steven J. M. Jones John MacKay Inanc Birol Joerg Bohlmann 《The Plant journal : for cell and molecular biology》2015,83(2):189-212

相似文献

9.

Genome assembly forensics: finding the elusive mis-assembly

Phillippy AM Schatz MC Pop M 《Genome biology》2008,9(3):R55

We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at . 相似文献

10.

Reducing assembly complexity of microbial genomes with single-molecule sequencing 总被引：1，自引：0，他引：1

Sergey Koren Gregory P Harhay Timothy PL Smith James L Bono Dayna M Harhay Scott D Mcvey Diana Radune Nicholas H Bergman Adam M Phillippy 《Genome biology》2013,14(9):R101

Background

The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem.

Results

To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads.

Conclusions

Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization. 相似文献

11.

The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation 总被引：1，自引：0，他引：1

Konstantinos Mavromatis Miriam L. Land Thomas S. Brettin Daniel J. Quest Alex Copeland Alicia Clum Lynne Goodwin Tanja Woyke Alla Lapidus Hans Peter Klenk Robert W. Cottingham Nikos C. Kyrpides 《PloS one》2012,7(12)

Background

The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.

Methodology/Principal Findings

In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.

Conclusion

These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio). 相似文献

12.

Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies

James F. Denton Jose Lugo-Martinez Abraham E. Tucker Daniel R. Schrider Wesley C. Warren Matthew W. Hahn 《PLoS computational biology》2014,10(12)

Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. 相似文献

13.

De novo assemblies of Luffa acutangula and Luffa cylindrica genomes reveal an expansion associated with substantial accumulation of transposable elements

Wirulda Pootakham Chutima Sonthirod Chaiwat Naktang Wanapinun Nawae Thippawan Yoocha Wasitthee Kongkachana Duangjai Sangsrakru Nukoon Jomchai Sonicha U‐thoomporn John R. Sheedy Jarunee Buaboocha Supat Mekiyanon Sithichoke Tangphatsornruang 《Molecular ecology resources》2021,21(1):212-225

相似文献

14.

Advances in plant chromosome genomics

Jaroslav Doležel Jan Vrána Petr Cápal Marie Kubaláková Veronika Burešová Hana Šimková 《Biotechnology advances》2014

Next generation sequencing (NGS) is revolutionizing genomics and is providing novel insights into genome organization, evolution and function. The number of plant genomes targeted for sequencing is rising. For the moment, however, the acquisition of full genome sequences in large genome species remains difficult, largely because the short reads produced by NGS platforms are inadequate to cope with repeat-rich DNA, which forms a large part of these genomes. The problem of sequence redundancy is compounded in polyploids, which dominate the plant kingdom. An approach to overcoming some of these difficulties is to reduce the full nuclear genome to its individual chromosomes using flow-sorting. The DNA acquired in this way has proven to be suitable for many applications, including PCR-based physical mapping, in situ hybridization, forming DNA arrays, the development of DNA markers, the construction of BAC libraries and positional cloning. Coupling chromosome sorting with NGS offers opportunities for the study of genome organization at the single chromosomal level, for comparative analyses between related species and for the validation of whole genome assemblies. Apart from the primary aim of reducing the complexity of the template, taking a chromosome-based approach enables independent teams to work in parallel, each tasked with the analysis of a different chromosome(s). Given that the number of plant species tractable for chromosome sorting is increasing, the likelihood is that chromosome genomics – the marriage of cytology and genomics – will make a significant contribution to the field of plant genetics. 相似文献

15.

Construction and Characterization of the BAC Library for Common Carp <Emphasis Type="Italic">Cyprinus Carpio</Emphasis> L. And Establishment of Microsynteny with Zebrafish <Emphasis Type="Italic">Danio Rerio</Emphasis>

Li Y Xu P Zhao Z Wang J Zhang Y Sun XW 《Marine biotechnology (New York, N.Y.)》2011,13(4):706-712

A bacterial artificial chromosome (BAC) library of common carp Cyprinus carpio L. was constructed as a part of ongoing common carp genome project, which is aiming assembly of common carp genome. The library, containing a total of 92,160 BAC clones with an average insert size of 141 kb, was constructed into the restriction site of Hind III on BAC vector CopyControl pCC1BAC, covering 7.7 X haploid genome equivalents. Three dimension pools and superpools of the BAC library were established and 23 positive clones of 14 targets were identified from one-fifth of the BAC library. Pilot project of BAC end sequencing was conducted on 2,688 BAC ends from 1,344 clones and harvested 2,522 high-quality Q20 sequences with average length of 677 bp. The sequencing success rate was 93.8% and pair-end success rate was 92.3%. A total of 212 microsyntenies had been established between common carp and zebrafish genomes as a trial for genome-wide comparative genomics in these two closely related species. 相似文献

16.

A reference genome of the Chinese hamster based on a hybrid assembly strategy

《Biotechnology and bioengineering》2018,115(8):2087-2100

Accurate and complete genome sequences are essential in biotechnology to facilitate genome‐based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short‐read‐based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina‐based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28‐fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up‐ and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research. 相似文献

17.

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Martin Mascher Gary J. Muehlbauer Daniel S. Rokhsar Jarrod Chapman Jeremy Schmutz Kerrie Barry María Muñoz‐Amatriaín Timothy J. Close Roger P. Wise Alan H. Schulman Axel Himmelbach Klaus F.X. Mayer Uwe Scholz Jesse A. Poland Nils Stein Robbie Waugh 《The Plant journal : for cell and molecular biology》2013,76(4):718-727

Next‐generation whole‐genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence‐based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost‐efficient establishment of powerful genomic information for many species. 相似文献

18.

A high‐throughput BAC end analysis protocol (BAC‐anchor) for profiling genome assembly and physical mapping

Xiaohui Yang Yu Yang Jian Ling Jiantao Guan Xiao Guo Daofeng Dong Liping Jin Sanwen Huang Jun Liu Guangcun Li 《Plant biotechnology journal》2020,18(2):364-372

Traditional approaches for sequencing insertion ends of bacterial artificial chromosome (BAC) libraries are laborious and expensive, which are currently some of the bottlenecks limiting a better understanding of the genomic features of auto‐ or allopolyploid species. Here, we developed a highly efficient and low‐cost BAC end analysis protocol, named BAC‐anchor, to identify paired‐end reads containing large internal gaps. Our approach mainly focused on the identification of high‐throughput sequencing reads carrying restriction enzyme cutting sites and searching for large internal gaps based on the mapping locations of both ends of the reads. We sequenced and analysed eight libraries containing over 3 200 000 BAC end clones derived from the BAC library of the tetraploid potato cultivar C88 digested with two restriction enzymes, Cla I and Mlu I. About 25% of the BAC end reads carrying cutting sites generated a 60–100 kb internal gap in the potato DM reference genome, which was consistent with the mapping results of Sanger sequencing of the BAC end clones and indicated large differences between autotetraploid and haploid genotypes in potato. A total of 5341 Cla I‐ and 165 Mlu I‐derived unique reads were distributed on different chromosomes of the DM reference genome and could be used to establish a physical map of target regions and assemble the C88 genome. The reads that matched different chromosomes are especially significant for the further assembly of complex polyploid genomes. Our study provides an example of analysing high‐coverage BAC end libraries with low sequencing cost and is a resource for further genome sequencing studies. 相似文献

19.

Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences

O'Brien HE Gong Y Fung P Wang PW Guttman DS 《PloS one》2011,6(11):e27199

Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains. 相似文献

20.

Assessing the gene space in draft genomes

Genis Parra Keith Bradnam Zemin Ning Thomas Keane Ian Korf 《Nucleic acids research》2009,37(1):289-297

Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values. 相似文献