首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot‐gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re‐scaffolding and gap‐filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as ”high confidence regions“ which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high‐quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines.  相似文献   

2.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

3.
Genetic and physical maps are powerful tools to anchor fragmented draft genome assemblies generated from next‐generation sequencing. Currently, two draft assemblies of Nelumbo nucifera, the genomes of ‘China Antique’ and ‘Chinese Tai‐zi’, have been released. However, there is presently no information on how the sequences are assembled into chromosomes in N. nucifera. The lack of physical maps and inadequate resolution of available genetic maps hindered the assembly of N. nucifera chromosomes. Here, a linkage map of N. nucifera containing 2371 bin markers [217 577 single nucleotide polymorphisms (SNPs)] was constructed using restriction‐site associated DNA sequencing data of 181 F2 individuals and validated by adding 197 simple sequence repeat (SSR) markers. Additionally, a BioNano optical map covering 86.20% of the ‘Chinese Tai‐zi’ genome was constructed. The draft assembly of ‘Chinese Tai‐zi’ was improved based on the BioNano optical map, showing an increase of the scaffold N50 from 0.989 to 1.48 Mb. Using a combination of multiple maps, 97.9% of the scaffolds in the ‘Chinese Tai‐zi’ draft assembly and 97.6% of the scaffolds in the ‘China Antique’ draft assembly were anchored into pseudo‐chromosomes, and the centromere regions along the pseudo‐chromosomes were identified. An evolutionary scenario was proposed to reach the modern N. nucifera karyotype from the seven ancestral eudicot chromosomes. The present study provides the highest‐resolution linkage map, the optical map and chromosome level genome assemblies for N. nucifera, which are valuable for the breeding and cultivation of N. nucifera and future studies of comparative and evolutionary genomics in angiosperms.  相似文献   

4.
Many economically important crops have large and complex genomes that hamper their sequencing by standard methods such as whole genome shotgun (WGS). Large tracts of methylated repeats occur in plant genomes that are interspersed by hypomethylated gene‐rich regions. Gene‐enrichment strategies based on methylation profiles offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration with McrBC endonuclease digestion to enrich for euchromatic regions in the sugarcane genome. To verify the efficiency of methylation filtration and the assembly quality of sequences submitted to gene‐enrichment strategy, we have compared assemblies using methyl‐filtered (MF) and unfiltered (UF) libraries. The use of methy filtration allowed a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5× more scaffolds and 1.7× more assembled Mb in length compared with unfiltered dataset. The coverage of sorghum coding sequences (CDS) by MF scaffolds was at least 36% higher than by the use of UF scaffolds. Using MF technology, we increased by 134× the coverage of gene regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds that covered all genes of the sugarcane bacterial artificial chromosomes (BACs), 97.2% of sugarcane expressed sequence tags (ESTs), 92.7% of sugarcane RNA‐seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds from encoded enzymes of the sucrose/starch pathway discovered 291 single‐nucleotide polymorphisms (SNPs) in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes was also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and for improvement of sugarcane as a biofuel crop.  相似文献   

5.
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules.  相似文献   

6.
7.
The 1.5 Gbp/2C genome of pedunculate oak (Quercus robur) has been sequenced. A strategy was established for dealing with the challenges imposed by the sequencing of such a large, complex and highly heterozygous genome by a whole‐genome shotgun (WGS) approach, without the use of costly and time‐consuming methods, such as fosmid or BAC clone‐based hierarchical sequencing methods. The sequencing strategy combined short and long reads. Over 49 million reads provided by Roche 454 GS‐FLX technology were assembled into contigs and combined with shorter Illumina sequence reads from paired‐end and mate‐pair libraries of different insert sizes, to build scaffolds. Errors were corrected and gaps filled with Illumina paired‐end reads and contaminants detected, resulting in a total of 17 910 scaffolds (>2 kb) corresponding to 1.34 Gb. Fifty per cent of the assembly was accounted for by 1468 scaffolds (N50 of 260 kb). Initial comparison with the phylogenetically related Prunus persica gene model indicated that genes for 84.6% of the proteins present in peach (mean protein coverage of 90.5%) were present in our assembly. The second and third steps in this project are genome annotation and the assignment of scaffolds to the oak genetic linkage map. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement, the oak genome data have been released into public sequence repositories in advance of publication. In this presubmission paper, the oak genome consortium describes its principal lines of work and future directions for analyses of the nature, function and evolution of the oak genome.  相似文献   

8.
We report on a whole‐genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe. Through whole‐genome shotgun sequencing of the 7.9‐Gbp genome of the winter rye inbred line Lo7 we obtained a de novo assembly represented by 1.29 million scaffolds covering a total length of 2.8 Gbp. Our reference sequence represents nearly the entire low‐copy portion of the rye genome. This genome assembly was used to predict 27 784 rye gene models based on homology to sequenced grass genomes. Through resequencing of 10 rye inbred lines and one accession of the wild relative S. vavilovii, we discovered more than 90 million single nucleotide variants and short insertions/deletions in the rye genome. From these variants, we developed the high‐density Rye600k genotyping array with 600 843 markers, which enabled anchoring the sequence contigs along a high‐density genetic map and establishing a synteny‐based virtual gene order. Genotyping data were used to characterize the diversity of rye breeding pools and genetic resources, and to obtain a genome‐wide map of selection signals differentiating the divergent gene pools. This rye whole‐genome sequence closes a gap in Triticeae genome research, and will be highly valuable for comparative genomics, functional studies and genome‐based breeding in rye.  相似文献   

9.
Salmonids are of particular interest to evolutionary biologists due to their incredible diversity of life‐history strategies and the speed at which many salmonid species have diversified. In Switzerland alone, over 30 species of Alpine whitefish from the subfamily Coregoninae have evolved since the last glacial maximum, with species exhibiting a diverse range of morphological and behavioural phenotypes. This, combined with the whole genome duplication which occurred in the ancestor of all salmonids, makes the Alpine whitefish radiation a particularly interesting system in which to study the genetic basis of adaptation and speciation and the impacts of ploidy changes and subsequent rediploidization on genome evolution. Although well‐curated genome assemblies exist for many species within Salmonidae, genomic resources for the subfamily Coregoninae are lacking. To assemble a whitefish reference genome, we carried out PacBio sequencing from one wild‐caught Coregonus sp. “Balchen” from Lake Thun to ~90× coverage. PacBio reads were assembled independently using three different assemblers, falcon , canu and wtdbg2 and subsequently scaffolded with additional Hi‐C data. All three assemblies were highly contiguous, had strong synteny to a previously published Coregonus linkage map, and when mapping additional short‐read data to each of the assemblies, coverage was fairly even across most chromosome‐scale scaffolds. Here, we present the first de novo genome assembly for the Salmonid subfamily Coregoninae. The final 2.2‐Gb wtdbg2 assembly included 40 scaffolds, an N50 of 51.9 Mb and was 93.3% complete for BUSCOs. The assembly consisted of ~52% transposable elements and contained 44,525 genes.  相似文献   

10.
Yak is an important livestock animal for the people indigenous to the harsh, oxygen‐limited Qinghai‐Tibetan Plateau and Hindu Kush ranges of the Himalayas. The yak genome was sequenced in 2012, but its assembly was fragmented because of the inherent limitations of the Illumina sequencing technology used to analyse it. An accurate and complete reference genome is essential for the study of genetic variations in this species. Long‐read sequences are more complete than their short‐read counterparts and have been successfully applied towards high‐quality genome assembly for various species. In this study, we present a high‐quality chromosome‐scale yak genome assembly (BosGru_PB_v1.0) constructed with long‐read sequencing and chromatin interaction technologies. Compared to an existing yak genome assembly (BosGru_v2.0), BosGru_PB_v1.0 shows substantially improved chromosome sequence continuity, reduced repetitive structure ambiguity, and gene model completeness. To characterize genetic variation in yak, we generated de novo genome assemblies based on Illumina short reads for seven recognized domestic yak breeds in Tibet and Sichuan and one wild yak from Hoh Xil. We compared these eight assemblies to the BosGru_PB_v1.0 genome, obtained a comprehensive map of yak genetic diversity at the whole‐genome level, and identified several protein‐coding genes absent from the BosGru_PB_v1.0 assembly. Despite the genetic bottleneck experienced by wild yak, their diversity was nonetheless higher than that of domestic yak. Here, we identified breed‐specific sequences and genes by whole‐genome alignment, which may facilitate yak breed identification.  相似文献   

11.
The Chinese hamster genome serves as a reference genome for the study of Chinese hamster ovary (CHO) cells, the preferred host system for biopharmaceutical production. Recent re-sequencing of the Chinese hamster genome resulted in the RefSeq PICR meta-assembly, a set of highly accurate scaffolds that filled over 95% of the gaps in previous assembly versions. However, these scaffolds did not reach chromosome-scale due to the absence of long-range scaffolding information during the meta-assembly process. Here, long-range scaffolding of the PICR Chinese hamster genome assembly was performed using high-throughput chromosome conformation capture (Hi-C). This process resulted in a new “PICRH” genome, where 97% of the genome is contained in 11 mega-scaffolds corresponding to the Chinese hamster chromosomes (2n = 22) and the total number of scaffolds is reduced by three-fold from 1,830 scaffolds in PICR to 647 in PICRH. Continuity was improved while preserving accuracy, leading to quality scores higher than recent builds of mouse chromosomes and comparable to human chromosomes. The PICRH genome assembly will be an indispensable tool for designing advanced genetic engineering strategies in CHO cells and enabling systematic examination of genomic and epigenomic instability through comparative analysis of CHO cell lines on a common set of chromosomal coordinates.  相似文献   

12.
Onychostoma macrolepis is an emerging commercial cyprinid fish species. It is a model system for studies of sexual dimorphism and genome evolution. Here, we report the chromosome‐level assembly of the O.macrolepis genome obtained from the integration of nanopore long‐read sequencing with physical maps produced using Bionano and Hi‐C technology. A total of 87.9 Gb of nanopore sequence provided approximately 100‐fold coverage of the genome. The preliminary genome assembly was 883.2 Mb in size with a contig N50 size of 11.2 Mb. The 969 corrected contigs obtained from Bionano optical mapping were assembled into 853 scaffolds and produced an assembly of 886.5 Mb with a scaffold N50 of 16.5 Mb. Finally, using the Hi‐C data, 881.3 Mb (99.4% of genome) in 526 scaffolds were anchored and oriented in 25 chromosomes ranging in size from 25.27 to 56.49 Mb. In total, 24,770 protein‐coding genes were predicted in the genome, and ~96.85% of the genes were functionally annotated. The annotated assembly contains 93.3% complete genes from the BUSCO reference set. In addition, we identified 409 Mb (46.23% of the genome) of repetitive sequence, and 11,213 non‐coding RNAs, in the genome. Evolutionary analysis revealed that O. macrolepis diverged from common carp approximately 24.25 million years ago. The chromosomes of O. macrolepis showed an unambiguous correspondence to the chromosomes of zebrafish. The high‐quality genome assembled in this work provides a valuable genomic resource for further biological and evolutionary studies of O. macrolepis.  相似文献   

13.
The iconic orange clownfish, Amphiprion percula, is a model organism for studying the ecology and evolution of reef fishes, including patterns of population connectivity, sex change, social organization, habitat selection and adaptation to climate change. Notably, the orange clownfish is the only reef fish for which a complete larval dispersal kernel has been established and was the first fish species for which it was demonstrated that antipredator responses of reef fishes could be impaired by ocean acidification. Despite its importance, molecular resources for this species remain scarce and until now it lacked a reference genome assembly. Here, we present a de novo chromosome‐scale assembly of the genome of the orange clownfish Amphiprion percula. We utilized single‐molecule real‐time sequencing technology from Pacific Biosciences to produce an initial polished assembly comprised of 1,414 contigs, with a contig N50 length of 1.86 Mb. Using Hi‐C‐based chromatin contact maps, 98% of the genome assembly were placed into 24 chromosomes, resulting in a final assembly of 908.8 Mb in length with contig and scaffold N50s of 3.12 and 38.4 Mb, respectively. This makes it one of the most contiguous and complete fish genome assemblies currently available. The genome was annotated with 26,597 protein‐coding genes and contains 96% of the core set of conserved actinopterygian orthologs. The availability of this reference genome assembly as a community resource will further strengthen the role of the orange clownfish as a model species for research on the ecology and evolution of reef fishes.  相似文献   

14.
15.
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole‐genome shotgun sequencing of the nuclear genome of flax. Seven paired‐end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep‐coverage (approximately 94× raw, approximately 69× filtered) short‐sequence reads (44–100 bp), produced a set of scaffolds with N50 = 694 kb, including contigs with N50 = 20.1 kb. The contig assembly contained 302 Mb of non‐redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole‐genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis‐assembly of regions at the genome scale. A total of 43 384 protein‐coding genes were predicted in the whole‐genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (Ks) observed within duplicate gene pairs was consistent with a recent (5–9 MYA) whole‐genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam‐A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole‐genome shotgun short‐sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.  相似文献   

16.
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.

A greatly improved reference genome sequence of barley was assembled from accurate long reads.  相似文献   

17.
Next‐generation whole‐genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence‐based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost‐efficient establishment of powerful genomic information for many species.  相似文献   

18.
Camelids are characterized by their unique adaptive immune system that exhibits the generation of homodimeric heavy‐chain immunoglobulins, somatic hypermutation of T‐cell receptors, and low genetic diversity of major histocompatibility complex (MHC) genes. However, short‐read assemblies are typically highly fragmented in these gene loci owing to their repetitive and polymorphic nature. Here, we constructed a chromosome‐level assembly of wild Bactrian camel genome based on high‐coverage long‐read sequencing and chromatin interaction mapping. The assembly with a contig N50 of 5.37 Mb and a scaffold N50 of 76.03 Mb, represents the most contiguous camelid genome to date. The genomic organization of immunoglobulin heavy‐chain locus was similar between the wild Bactrian camel and alpaca, and genes encoding for conventional and heavy‐chain antibodies were intermixed. The organizations of two immunoglobulin light‐chain loci and four T cell receptor loci were also fully deciphered using the new assembly. Additionally, the complete classical MHC region was resolved into a single contig. The high‐quality assembly presented here provides an essential reference for future investigations examining the camelid immune system.  相似文献   

19.

Background  

The genome of Anopheles gambiae, the major vector of malaria, was sequenced and assembled in 2002. This initial genome assembly and analysis made available to the scientific community was complicated by the presence of assembly issues, such as scaffolds with no chromosomal location, no sequence data for the Y chromosome, haplotype polymorphisms resulting in two different genome assemblies in limited regions and contaminating bacterial DNA.  相似文献   

20.
Decreasing sequencing costs have driven a rapid expansion of novel genotyping methods. One of these methods is the exploitation of restriction enzyme cut sites to generate genome‐wide but reduced representation sequencing libraries (RRLs), alternatively termed genotyping by sequencing or restriction‐site associated DNA sequencing. Without a reference genome, the resulting short sequence reads must be assembled de novo. There are many possible assembly programs, most not explicitly developed for RRL data, and we know little of their effectiveness. In this issue of Molecular Ecology Resources, LaCava et al. (2020) systematically evaluate six commonly used programs and two commonly varied parameters for complete and accurate assembly of RRLs, using simulated double digests of Homo sapiens and Arabidopsis thaliana genomes with varied mutation rates and types. The authors find substantial variation in performance across assembly programs. The most consistently high‐performing assembler is infrequently used in their literature survey (CD‐HIT; Li and Godzik, 2006), while several others fail to produce complete, accurate assemblies under many conditions. LaCava et al. additionally recommend best practices in parameter choice and evaluation of future assembly programs—advice that molecular ecologists working to assemble sequences of all kinds should take to heart.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号