首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
采用二代和三代测序技术分别对金针菇单核体菌株“6-3”进行测序,应用4种组装策略进行基因组的de novo组装,对比组装效果。基因组组装的参数方面,仅使用二代测序组装的效果最差,长度大于10kb的Contig全长只有24.6Mb,Contig N50只有23kb,组装率只有59.27%。采用三代组装二代校正的组装策略效果最好,长度大于10kb的Contig全长为38.3Mb,Contig N50为2.8Mb,组装率高达92.16%。保守单拷贝基因拼接效果方面,4种组装策略获得基因组序列与BUSCO数据库里的担子菌的保守单拷贝基因比对,基因完整性均大于94%。在组装准确性方面,经过PCR扩增、Sanger测序验证,三代组装二代校正的基因组序列完整并且连续,同时序列上碱基的SNP、InDel数量最少。综上所述,三代组装二代校正得到的基因组序列具有Contig N50值大、组装率高、碱基准确性高的特点,是食用菌基因组测序较为理想的方案。  相似文献   

2.
Kobresia species are common in meadows on the Qinghai–Tibet Plateau. They are important food resources for local livestock, and serve a critical foundation for ecosystem integration. Genetic resources of Kobresia species are scarce. Here, we generated a chromosome-level genome assembly for K. myosuroides (Cyperaceae), using PacBio long-reads, Illumina short-reads, and Hi–C technology. The final assembly had a total size of 399.9 Mb with a contig N50 value of 11.9 Mb. The Hi–C result supported a 29 pseudomolecules model which was in consistent with cytological results. A total of 185.5 Mb (44.89% of the genome) transposable elements were detected, and 26,748 protein-coding genes were predicted. Comparative analysis revealed that Kobresia plants have experienced recent diversification events during the late Miocene to Pliocene. Karyotypes analysis indicated that the fission and fusion of chromosomes have been a major driver of speciation, which complied with the lack of whole-genome duplication (WGD) in K. myosuroides genome. Generally, this high-quality reference genome provides insights into the evolution of alpine sedges, and may be helpful to endemic forage improvement and alpine ecosystem preservation.  相似文献   

3.
Taro (Colocasia esculenta (L.), Schott), from the Araceae family, is one of the oldest crops with important edible, medicinal, nutritional and economic value. Taro is a highly polymorphic species including diverse genotypes adapted to a broad range of environments, but the taro genome has rarely been investigated. Here, a high‐quality chromosome‐level genome of C. esculenta was assembled using data sequenced by Illumina, PacBio and Nanopore platforms. The assembled genome size was 2,405 Mb with a contig N50 of 400.0 kb and a scaffold N50 of 159.4 Mb. In total, 2,311 Mb (96.09%) of the contig sequences was anchored onto 14 chromosomes to form pseudomolecules, and 2,126 Mb (88.43%) was annotated as repetitive sequences. Of the 28,695 predicted protein‐coding genes, 26,215 genes (91.4%) could be functionally annotated. On the basis of phylogenetic analysis using 769 genes, C. esculenta and Spirodela polyrhiza were placed on one branch of the tree that diverged approximately 73.23 million years ago. The synteny analyses showed that there have been two whole‐genome duplication events in C. esculenta separated by a relatively short gap. According to comparative genome analysis, a larger number (1,189) of distinct gene families and long terminal repeats were enriched in C. esculenta. Our high‐quality taro genome will provide valuable resources for further genetic, ecological and evolutionary analyses of taro or other species in the Araceae.  相似文献   

4.
Onychostoma macrolepis is an emerging commercial cyprinid fish species. It is a model system for studies of sexual dimorphism and genome evolution. Here, we report the chromosome‐level assembly of the O.macrolepis genome obtained from the integration of nanopore long‐read sequencing with physical maps produced using Bionano and Hi‐C technology. A total of 87.9 Gb of nanopore sequence provided approximately 100‐fold coverage of the genome. The preliminary genome assembly was 883.2 Mb in size with a contig N50 size of 11.2 Mb. The 969 corrected contigs obtained from Bionano optical mapping were assembled into 853 scaffolds and produced an assembly of 886.5 Mb with a scaffold N50 of 16.5 Mb. Finally, using the Hi‐C data, 881.3 Mb (99.4% of genome) in 526 scaffolds were anchored and oriented in 25 chromosomes ranging in size from 25.27 to 56.49 Mb. In total, 24,770 protein‐coding genes were predicted in the genome, and ~96.85% of the genes were functionally annotated. The annotated assembly contains 93.3% complete genes from the BUSCO reference set. In addition, we identified 409 Mb (46.23% of the genome) of repetitive sequence, and 11,213 non‐coding RNAs, in the genome. Evolutionary analysis revealed that O. macrolepis diverged from common carp approximately 24.25 million years ago. The chromosomes of O. macrolepis showed an unambiguous correspondence to the chromosomes of zebrafish. The high‐quality genome assembled in this work provides a valuable genomic resource for further biological and evolutionary studies of O. macrolepis.  相似文献   

5.
Soybean was domesticated in China and has become one of the most important oilseed crops. Due to bottlenecks in their introduction and dissemination, soybeans from different geographic areas exhibit extensive genetic diversity. Asia is the largest soybean market; therefore, a high-quality soybean reference genome from this area is critical for soybean research and breeding.Here, we report the de novo assembly and sequence analysis of a Chinese soybean genome for "Zhonghuang 13" by a combination of SMRT, Hi-C and optical mapping data. The assembled genome size is 1.025 Gb with a contig N50 of 3.46 Mb and a scaffold N50 of 51.87 Mb. Comparisons between this genome and the previously reported reference genome(cv. Williams82) uncovered more than 250,000 structure variations. A total of 52,051 protein coding genes and 36,429 transposable elements were annotated for this genome, and a gene co-expression network including 39,967 genes was also established. This high quality Chinese soybean genome and its sequence analysis will provide valuable information for soybean improvement in the future.  相似文献   

6.
Soybean was domesticated in China and has become one of the most important oilseed crops. Due to bottlenecks in their introduction and dissemination, soybeans from different geographic areas exhibit extensive genetic diversity. Asia is the largest soybean market; therefore, a high–quality soybean reference genome from this area is critical for soybean research and breeding. Here, we report the de novo assembly and sequence analysis of a Chinese soybean genome for “Zhonghuang 13” by a combination of SMRT, Hi–C and optical mapping data. The assembled genome size is 1.025 Gb with a contig N50 of 3.46 Mb and a scaffold N50 of 51.87 Mb. Comparisons between this genome and the previously reported reference genome (cv. Williams 82) uncovered more than 250,000 structure variations. A total of 52,051 protein coding genes and 36,429 transposable elements were annotated for this genome, and a gene co–expression network including 39,967 genes was also established. This high quality Chinese soybean genome and its sequence analysis will provide valuable information for soybean improvement in the future.  相似文献   

7.
The red‐spotted grouper Epinephelus akaara (E. akaara) is one of the most economically important marine fish in China, Japan and South‐East Asia and is a threatened species. The species is also considered a good model for studies of sex inversion, development, genetic diversity and immunity. Despite its importance, molecular resources for E. akaara remain limited and no reference genome has been published to date. In this study, we constructed a chromosome‐level reference genome of E. akaara by taking advantage of long‐read single‐molecule sequencing and de novo assembly by Oxford Nanopore Technology (ONT) and Hi‐C. A red‐spotted grouper genome of 1.135 Gb was assembled from a total of 106.29 Gb polished Nanopore sequence (GridION, ONT), equivalent to 96‐fold genome coverage. The assembled genome represents 96.8% completeness (BUSCO) with a contig N50 length of 5.25 Mb and a longest contig of 25.75 Mb. The contigs were clustered and ordered onto 24 pseudochromosomes covering approximately 95.55% of the genome assembly with Hi‐C data, with a scaffold N50 length of 46.03 Mb. The genome contained 43.02% repeat sequences and 5,480 noncoding RNAs. Furthermore, combined with several RNA‐seq data sets, 23,808 (99.5%) genes were functionally annotated from a total of 23,923 predicted protein‐coding sequences. The high‐quality chromosome‐level reference genome of E. akaara was assembled for the first time and will be a valuable resource for molecular breeding and functional genomics studies of red‐spotted grouper in the future.  相似文献   

8.
Caper spurge, Euphorbia lathyris L., is an important energy crop and medicinal crop. Here, we generated a high-quality, chromosome-level genome assembly of caper spurge using Oxford Nanopore sequencing, Illumina sequencing, and Hi-C technology. The final genome assembly was ∼988.9 Mb in size, 99.8% of which could be grouped into 10 pseudochromosomes, with contig and scaffold N50 values of 32.6 and 95.7 Mb, respectively. A total of 651.4 Mb repetitive sequences and 36,342 protein-coding genes were predicted in the genome assembly. Comparative genomic analysis showed that caper spurge and castor bean clustered together. We found that no independent whole-genome duplication event had occurred in caper spurge after its split from the castor bean, and recent substantial amplification of long terminal repeat retrotransposons has contributed significantly to its genome expansion. Furthermore, based on gene homology searching, we identified a number of candidate genes involved in the biosynthesis of fatty acids and triacylglycerols. The reference genome presented here will be highly useful for the further study of the genetics, genomics, and breeding of this high-value crop, as well as for evolutionary studies of spurge family and angiosperms.  相似文献   

9.
10.
The Tetraodontidae family are known to have relatively small and compact genomes compared to other vertebrates. The obscure puffer fish Takifugu obscurus is an anadromous species that migrates to freshwater from the sea for spawning. Thus the euryhaline characteristics of T. obscurus have been investigated to gain understanding of their survival ability, osmoregulation, and other homeostatic mechanisms in both freshwater and seawater. In this study, a high quality chromosome‐level reference genome for T. obscurus was constructed using long‐read Pacific Biosciences (PacBio) Sequel sequencing and a Hi‐C‐based chromatin contact map platform. The final genome assembly of T. obscurus is 381 Mb, with a contig N50 length of 3,296 kb and longest length of 10.7 Mb, from a total of 62 Gb of raw reads generated using single‐molecule real‐time sequencing technology from a PacBio Sequel platform. The PacBio data were further clustered into chromosome‐scale scaffolds using a Hi‐C approach, resulting in a 373 Mb genome assembly with a contig N50 length of 15.2 Mb and and longest length of 28 Mb. When we directly compared the 22 longest scaffolds of T. obscurus to the 22 chromosomes of the tiger puffer Takifugu rubripes, a clear one‐to‐one orthologous relationship was observed between the two species, supporting the chromosome‐level assembly of T. obscurus. This genome assembly can serve as a valuable genetic resource for exploring fugu‐specific compact genome characteristics, and will provide essential genomic information for understanding molecular adaptations to salinity fluctuations and the evolution of osmoregulatory mechanisms.  相似文献   

11.
The ascomycete Venturia inaequalis, causal pathogen of apple scab, underlies a gene-for-gene relationship with its host plant apple (Malus spp.). 'Golden Delicious', one of the most common cultivated apples in the world, carries the ephemeral resistance gene Vg. Avirulence gene AvrVg, matching resistance gene Vg has recently been mapped on the V. inaequalis genome. In this paper, we present the construction of a BAC library from a V. inaequalis AvrVg isolate. The library is composed of 7680 clones, with an average insert size of 80kb. By hybridization, it has been estimated that the library contains six haploid genome equivalents. Thus the V. inaequalis genome can be predicted to be approximately 100Mb in size. A chromosome walk, starting from the marker VirQ5 co-segregating with AvrVg, has been performed using the BAC library. Twelve BAC clones were identified during four steps of the chromosome walking. The size of the resulting contig is approximately 330kb.  相似文献   

12.
The ladybird beetle Propylea japonica is an important natural enemy in agro‐ecological systems. Studies on the strong tolerance of P. japonica to high temperatures and insecticides, and its population and phenotype diversity have recently increased. However, abundant genome resources for obtaining insights into stress‐resistance mechanisms and genetic intra‐species diversity for P. japonica are lacking. Here, we constructed the P. japonica genome maps using Pacific Bioscience (PacBio) and Illumina sequencing technologies. The genome size was 850.90 Mb with a contig N50 of 813.13 kb. The Hi‐C sequence data were used to upgrade draft genome assemblies; 4,777 contigs were assembled to 10 chromosomes; and the final draft genome assembly was 803.93 Mb with a contig N50 of 813.98 kb and a scaffold N50 of 100.34 Mb. Approximately 495.38 Mb of repeated sequences was annotated. The 18,018 protein‐coding genes were predicted, of which 95.78% were functionally annotated, and 1,407 genes were species‐specific. The phylogenetic analysis showed that P. japonica diverged from the ancestor of Anoplophora glabripennis and Tribolium castaneum ~ 236.21 million years ago. We detected that some important gene families involved in detoxification of pesticides and tolerance to heat stress were expanded in P. japonica, especially cytochrome P450 and Hsp70 genes. Overall, the high‐quality draft genome sequence of P. japonica will provide invaluable resource for understanding the molecular mechanisms of stress resistance and will facilitate the research on population genetics, evolution and phylogeny of Coccinellidae. This genome will also provide new avenues for conserving the diversity of predator insects.  相似文献   

13.
Japanese chestnut (Castanea crenata Sieb. et Zucc.), unlike other Castanea species, is resistant to most diseases and wasps. However, genomic data of Japanese chestnut that could be used to determine its biotic stress resistance mechanisms have not been reported to date. In this study, we employed long-read sequencing and genetic mapping to generate genome sequences of Japanese chestnut at the chromosome level. Long reads (47.7 Gb; 71.6× genome coverage) were assembled into 781 contigs, with a total length of 721.2 Mb and a contig N50 length of 1.6 Mb. Genome sequences were anchored to the chestnut genetic map, comprising 14,973 single nucleotide polymorphisms (SNPs) and covering 1,807.8 cM map distance, to establish a chromosome-level genome assembly (683.8 Mb), with 69,980 potential protein-encoding genes and 425.5 Mb repetitive sequences. Furthermore, comparative genome structure analysis revealed that Japanese chestnut shares conserved chromosomal segments with woody plants, but not with herbaceous plants, of rosids. Overall, the genome sequence data of Japanese chestnut generated in this study is expected to enhance not only its genetics and genomics but also the evolutionary genomics of woody rosids.  相似文献   

14.
Peach (Prunus persica L. Batsch) is an economically important fruit crop worldwide. Although a high-quality peach genome has previously been published, Sanger sequencing was used for its assembly, which generated short contigs. Here, we report a chromosome-level genome assembly and sequence analysis of Chinese Cling, an important founder cultivar for peach breeding programs worldwide. The assembled genome contained 247.33 Mb with a contig N50 of 4.13 Mb and a scaffold N50 of 29.68 Mb, representing 99.8% of the estimated genome. Comparisons between this genome and the recently published one (Lovell peach) uncovered 685 407 single nucleotide polymorphisms, 162 655 insertions and deletions, and 16 248 structural variants. Gene family analysis highlighted the contraction of the gene families involved in flavone, flavonol, flavonoid, and monoterpenoid biosynthesis. Subsequently, the volatile compounds of 256 peach varieties were quantitated in mature fruits in 2015 and 2016 to perform a genome-wide association analysis. A comparison with the identified domestication genomic regions allowed us to identify 25 quantitative trait loci, associated with seven volatile compounds, in the domestication region, which is consistent with the differences in volatile compounds between wild and cultivated peaches. Finally, a gene encoding terpene synthase, located within a previously reported quantitative trait loci region, was identified to be associated with linalool synthesis. Such findings highlight the importance of this new assembly for the analysis of evolutionary mechanisms and gene identification in peach species. Furthermore, this high-quality peach genome provides valuable information for future fruit improvement.  相似文献   

15.
The greenfin horse‐faced filefish, Thamnaconus septentrionalis, is a valuable commercial fish species that is widely distributed in the Indo‐West Pacific Ocean. This fish has characteristic blue–green fins, rough skin and a spine‐like first dorsal fin. Thamnaconus septentrionalis is of conservation concern because its population has declined sharply, and it is an important marine aquaculture fish species in China. Genomic resources for the filefish are lacking, and no reference genome has been released. In this study, the first chromosome‐level genome of T. septentrionalis was constructed using nanopore sequencing and Hi‐C technology. A total of 50.95 Gb polished nanopore sequences were generated and were assembled into a 474.31‐Mb genome, accounting for 96.45% of the estimated genome size of this filefish. The assembled genome contained only 242 contigs, and the achieved contig N50 was 22.46 Mb, a surprisingly high value among all sequenced fish species. Hi‐C scaffolding of the genome resulted in 20 pseudochromosomes containing 99.44% of the total assembled sequences. The genome contained 67.35 Mb of repeat sequences, accounting for 14.2% of the assembly. A total of 22,067 protein‐coding genes were predicted, 94.82% of which were successfully annotated with putative functions. Furthermore, a phylogenetic tree was constructed using 1,872 single‐copy orthologous genes, and 67 unique gene families were identified in the filefish genome. This high‐quality assembled genome will be a valuable resource for a range of future genomic, conservation and breeding studies of T. septentrionalis.  相似文献   

16.
A new YAC (yeast artificial chromosome) physical map of the 12 rice chromosomes was constructed utilizing the latest molecular linkage map. The 1439 DNA markers on the rice genetic map selected a total of 1892 YACs from a YAC library. A total of 675 distinct YACs were assigned to specific chromosomal locations. In all chromosomes, 297 YAC contigs and 142 YAC islands were formed. The total physical length of these contigs and islands was estimated to 270 Mb which corresponds to approximately 63% of the entire rice genome (430 Mb). Because the physical length of each YAC contig has been measured, we could then estimate the physical distance between genetic markers more precisely than previously. In the course of constructing the new physical map, the DNA markers mapped at 0.0-cM intervals were ordered accurately and the presence of potentially duplicated regions among the chromosomes was detected. The physical map combined with the genetic map will form the basis for elucidation of the rice genome structure, map-based cloning of agronomically important genes, and genome sequencing.  相似文献   

17.
We generated a sequence-ready BAC/PAC contig spanning approximately 5.5 Mb on porcine chromosome 6q1.2, which represents a very gene-rich genome region. STS content mapping was used as the main strategy for the assembly of the contig and a total of 6 microsatellite markers, 53 gene-related STS and 116 STS corresponding to BAC and PAC end sequences were analyzed. The contig comprises 316 BAC and PAC clones covering the region between the genes GPI and LIPE. The correct contig assembly was verified by RH-mapping of STS markers and comparative mapping of BAC/PAC end sequences using BLAST searches. The use of microsatellite primer pairs allowed the integration of the physical maps with the genetic map of this region. Comparative mapping of the porcine BAC/PAC contig with respect to the gene-rich region on the human chromosome 19q13.1 map revealed a completely conserved gene order of this segment, however, physical distances differ somewhat between HSA19q13.1 and SSC6q1.2. Three major differences in DNA content between human and pig are found in two large intergenic regions and in one region of a clustered gene family, respectively. While there is a complete conservation of gene order between pig and human, the comparative analysis with respect to the rodent species mouse and rat shows one breakpoint where a genome segment is inverted.  相似文献   

18.
Oil camellia trees are important woody plants for the production of high-quality cooking oil. On the contrary to their economic importance, their genetic and genomic resources are very limited, which greatly hamper the genetic studies on oil camellia trees. Microsatellites or simple sequence repeats (SSRs) have great value in many aspects of genetic analyses due to their high polymorphism and codominant inheritance. In this study, we report the large-scale development and characterization of SSR markers derived from genomic sequences of Camellia chekiangoleosa by high-throughput pyrosequencing technology. A total of 1,091,393 genomic shotgun reads were generated using Roche 454 FLX sequencer, the average read length was 319 bp, and the total sequence throughput was 347.9 Mb. These sequences were assembled into 35,315 contigs with total length of 14.8 Mb and the N50 contig size of 770 bp. By analyzing with microsatellite (MISA), a total of 5,844 perfect microsatellites were detected from the assembled sequences. Among them, tetranucleotide repeats were found to be the most frequent microsatellites in the genome of C. chekiangoleosa, and all the dominant repeat motifs for different types of SSRs were detected to be rich in A/T. Experimental analysis with 900 SSR primer pairs revealed that 66 % of them succeeded in PCR amplification. Further investigation with 345 SSR primer pairs showed that a relatively high percentage of primers amplified polymorphic loci (31.9 %). Experimental data also revealed that, overall, long microsatellite repeats (>20 bp) were more variable than the short ones (<20 bp) in the genome of oil camellia tree.  相似文献   

19.
Culex pipiens molestus and Culex pipiens pallens are two distinct bioforms in the Culex pipiens complex that are important vectors of several pathogens and are widely distributed around the world. In the current study, we present a high-quality chromosome-level genome of Cx. pipiens f. molestus and describe the genetic characteristics of this genome. The assembly genome was 559.749 Mb with contig and scaffold N50 values of 200.952 Mb and 0.370 Mb, and more than 94.78% of the assembled bases were located on 3 chromosomes. A total of 19,399 protein-coding genes were predicted. Many gene families were expanded in the genome of Cx. pipiens f. molestus, particularly those of the chemosensory protein (CSP) and gustatory receptor (GR) gene families. In addition, utilizing Hi-C data, we improved the previously assembled draft genome of Cx. pipiens f. pallens, with scaffold N50 of 186.195 Mb and contig N50 of 0.749 Mb, and more than 97.02% of the assembled bases were located on three chromosomes. This reference genome provides a foundation for genome-based investigations of the unique ecological and evolutionary characteristics of Cx. pipiens f. molestus, and the findings in this study will help to elucidate the mechanisms involved in species divergence in the Culex pipiens complex.  相似文献   

20.
Camelids are characterized by their unique adaptive immune system that exhibits the generation of homodimeric heavy‐chain immunoglobulins, somatic hypermutation of T‐cell receptors, and low genetic diversity of major histocompatibility complex (MHC) genes. However, short‐read assemblies are typically highly fragmented in these gene loci owing to their repetitive and polymorphic nature. Here, we constructed a chromosome‐level assembly of wild Bactrian camel genome based on high‐coverage long‐read sequencing and chromatin interaction mapping. The assembly with a contig N50 of 5.37 Mb and a scaffold N50 of 76.03 Mb, represents the most contiguous camelid genome to date. The genomic organization of immunoglobulin heavy‐chain locus was similar between the wild Bactrian camel and alpaca, and genes encoding for conventional and heavy‐chain antibodies were intermixed. The organizations of two immunoglobulin light‐chain loci and four T cell receptor loci were also fully deciphered using the new assembly. Additionally, the complete classical MHC region was resolved into a single contig. The high‐quality assembly presented here provides an essential reference for future investigations examining the camelid immune system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号