共查询到20条相似文献,搜索用时 0 毫秒
1.
Hui Ge Kebing Lin Mi Shen Shuiqing Wu Yilei Wang Ziping Zhang Zhiyong Wang Yong Zhang Zhen Huang Chen Zhou Qi Lin Jianshao Wu Lei Liu Jiang Hu Zhongchi Huang Leyun Zheng 《Molecular ecology resources》2019,19(6):1461-1469
The red‐spotted grouper Epinephelus akaara (E. akaara) is one of the most economically important marine fish in China, Japan and South‐East Asia and is a threatened species. The species is also considered a good model for studies of sex inversion, development, genetic diversity and immunity. Despite its importance, molecular resources for E. akaara remain limited and no reference genome has been published to date. In this study, we constructed a chromosome‐level reference genome of E. akaara by taking advantage of long‐read single‐molecule sequencing and de novo assembly by Oxford Nanopore Technology (ONT) and Hi‐C. A red‐spotted grouper genome of 1.135 Gb was assembled from a total of 106.29 Gb polished Nanopore sequence (GridION, ONT), equivalent to 96‐fold genome coverage. The assembled genome represents 96.8% completeness (BUSCO) with a contig N50 length of 5.25 Mb and a longest contig of 25.75 Mb. The contigs were clustered and ordered onto 24 pseudochromosomes covering approximately 95.55% of the genome assembly with Hi‐C data, with a scaffold N50 length of 46.03 Mb. The genome contained 43.02% repeat sequences and 5,480 noncoding RNAs. Furthermore, combined with several RNA‐seq data sets, 23,808 (99.5%) genes were functionally annotated from a total of 23,923 predicted protein‐coding sequences. The high‐quality chromosome‐level reference genome of E. akaara was assembled for the first time and will be a valuable resource for molecular breeding and functional genomics studies of red‐spotted grouper in the future. 相似文献
2.
Filip Wierzbicki Florian Schwarz Odontsetseg Cannalonga Robert Kofler 《Molecular ecology resources》2022,22(1):102-121
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana. 相似文献
3.
Seunghyun Kang Jin‐Hyoung Kim Euna Jo Seung Jae Lee Jihye Jung Bo‐Mi Kim Jun Hyuck Lee Tae‐Jin Oh Seungshic Yum Jae‐Sung Rhee Hyun Park 《Molecular ecology resources》2020,20(2):520-530
The Tetraodontidae family are known to have relatively small and compact genomes compared to other vertebrates. The obscure puffer fish Takifugu obscurus is an anadromous species that migrates to freshwater from the sea for spawning. Thus the euryhaline characteristics of T. obscurus have been investigated to gain understanding of their survival ability, osmoregulation, and other homeostatic mechanisms in both freshwater and seawater. In this study, a high quality chromosome‐level reference genome for T. obscurus was constructed using long‐read Pacific Biosciences (PacBio) Sequel sequencing and a Hi‐C‐based chromatin contact map platform. The final genome assembly of T. obscurus is 381 Mb, with a contig N50 length of 3,296 kb and longest length of 10.7 Mb, from a total of 62 Gb of raw reads generated using single‐molecule real‐time sequencing technology from a PacBio Sequel platform. The PacBio data were further clustered into chromosome‐scale scaffolds using a Hi‐C approach, resulting in a 373 Mb genome assembly with a contig N50 length of 15.2 Mb and and longest length of 28 Mb. When we directly compared the 22 longest scaffolds of T. obscurus to the 22 chromosomes of the tiger puffer Takifugu rubripes, a clear one‐to‐one orthologous relationship was observed between the two species, supporting the chromosome‐level assembly of T. obscurus. This genome assembly can serve as a valuable genetic resource for exploring fugu‐specific compact genome characteristics, and will provide essential genomic information for understanding molecular adaptations to salinity fluctuations and the evolution of osmoregulatory mechanisms. 相似文献
4.
Jianmei Yin Lu Jiang Li Wang Xiaoyong Han Wenqi Guo Chunhong Li Yi Zhou Matthew Denton Peitong Zhang 《Molecular ecology resources》2021,21(1):68-77
Taro (Colocasia esculenta (L.), Schott), from the Araceae family, is one of the oldest crops with important edible, medicinal, nutritional and economic value. Taro is a highly polymorphic species including diverse genotypes adapted to a broad range of environments, but the taro genome has rarely been investigated. Here, a high‐quality chromosome‐level genome of C. esculenta was assembled using data sequenced by Illumina, PacBio and Nanopore platforms. The assembled genome size was 2,405 Mb with a contig N50 of 400.0 kb and a scaffold N50 of 159.4 Mb. In total, 2,311 Mb (96.09%) of the contig sequences was anchored onto 14 chromosomes to form pseudomolecules, and 2,126 Mb (88.43%) was annotated as repetitive sequences. Of the 28,695 predicted protein‐coding genes, 26,215 genes (91.4%) could be functionally annotated. On the basis of phylogenetic analysis using 769 genes, C. esculenta and Spirodela polyrhiza were placed on one branch of the tree that diverged approximately 73.23 million years ago. The synteny analyses showed that there have been two whole‐genome duplication events in C. esculenta separated by a relatively short gap. According to comparative genome analysis, a larger number (1,189) of distinct gene families and long terminal repeats were enriched in C. esculenta. Our high‐quality taro genome will provide valuable resources for further genetic, ecological and evolutionary analyses of taro or other species in the Araceae. 相似文献
5.
Corinna Breusing Darrin T. Schultz Sebastian Sudek Alexandra Z. Worden Curtis Robert Young 《Molecular ecology resources》2020,20(5):1432-1444
Symbiotic relationships between vestimentiferan tubeworms and chemosynthetic Gammaproteobacteria build the foundations of many hydrothermal vent and hydrocarbon seep ecosystems in the deep sea. The association between the vent tubeworm Riftia pachyptila and its endosymbiont Candidatus Endoriftia persephone has become a model system for symbiosis research in deep‐sea vestimentiferans, while markedly fewer studies have investigated symbiotic relationships in other tubeworm species, especially at cold seeps. Here we sequenced the endosymbiont genome of the tubeworm Lamellibrachia barhami from a cold seep in the Gulf of California, using short‐ and long‐read sequencing technologies in combination with Hi‐C and Dovetail Chicago libraries. Our final assembly had a size of ~4.17 MB, a GC content of 54.54%, 137X coverage, 4153 coding sequences, and a CheckM completeness score of 97.19%. A single scaffold contained 99.51% of the genome. Comparative genomic analyses indicated that the L. barhami symbiont shares a set of core genes and many metabolic pathways with other vestimentiferan symbionts, while containing 433 unique gene clusters that comprised a variety of transposases, defence‐related genes and a lineage‐specific CRISPR/Cas3 system. This assembly represents the most contiguous tubeworm symbiont genome resource to date and will be particularly valuable for future comparative genomic studies investigating structural genome evolution, physiological adaptations and host‐symbiont communication in chemosynthetic animal‐microbe symbioses. 相似文献
6.
7.
Valentina Peona Mozes P. K. Blom Luohao Xu Reto Burri Shawn Sullivan Ignas Bunikis Ivan Liachko Tri Haryoko Knud A. Jnsson Qi Zhou Martin Irestedt Alexander Suh 《Molecular ecology resources》2021,21(1):263-286
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes. 相似文献
8.
Mohammed-Amin Madoui Stefan Engelen Corinne Cruaud Caroline Belser Laurie Bertrand Adriana Alberti Arnaud Lemainque Patrick Wincker Jean-Marc Aury 《BMC genomics》2015,16(1)
Background
Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.Results
The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.Conclusions
We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users. 相似文献9.
Lina Sun Tian Gao Feilong Wang Zuliang Qin Longxia Yan Wenjing Tao Minghui Li Canbiao Jin Li Ma Thomas D. Kocher Deshou Wang 《Molecular ecology resources》2020,20(5):1361-1371
Onychostoma macrolepis is an emerging commercial cyprinid fish species. It is a model system for studies of sexual dimorphism and genome evolution. Here, we report the chromosome‐level assembly of the O.macrolepis genome obtained from the integration of nanopore long‐read sequencing with physical maps produced using Bionano and Hi‐C technology. A total of 87.9 Gb of nanopore sequence provided approximately 100‐fold coverage of the genome. The preliminary genome assembly was 883.2 Mb in size with a contig N50 size of 11.2 Mb. The 969 corrected contigs obtained from Bionano optical mapping were assembled into 853 scaffolds and produced an assembly of 886.5 Mb with a scaffold N50 of 16.5 Mb. Finally, using the Hi‐C data, 881.3 Mb (99.4% of genome) in 526 scaffolds were anchored and oriented in 25 chromosomes ranging in size from 25.27 to 56.49 Mb. In total, 24,770 protein‐coding genes were predicted in the genome, and ~96.85% of the genes were functionally annotated. The annotated assembly contains 93.3% complete genes from the BUSCO reference set. In addition, we identified 409 Mb (46.23% of the genome) of repetitive sequence, and 11,213 non‐coding RNAs, in the genome. Evolutionary analysis revealed that O. macrolepis diverged from common carp approximately 24.25 million years ago. The chromosomes of O. macrolepis showed an unambiguous correspondence to the chromosomes of zebrafish. The high‐quality genome assembled in this work provides a valuable genomic resource for further biological and evolutionary studies of O. macrolepis. 相似文献
10.
Xuefen Yang Haiping Liu Zhihong Ma Yu Zou Ming Zou Youzhi Mao Xiaomei Li Huan Wang Tiansheng Chen Weimin Wang Ruibin Yang 《Molecular ecology resources》2019,19(4):1027-1036
Triplophysa is an endemic fish genus of the Tibetan Plateau in China. Triplophysa tibetana, which lives at a recorded altitude of ~4,000 m and plays an important role in the highland aquatic ecosystem, serves as an excellent model for investigating high‐altitude environmental adaptation. However, evolutionary and conservation studies of T. tibetana have been limited by scarce genomic resources for the genus Triplophysa. In the present study, we applied PacBio sequencing and the Hi‐C technique to assemble the T. tibetana genome. A 652‐Mb genome with 1,325 contigs with an N50 length of 3.1 Mb was obtained. The 1,137 contigs were further assembled into 25 chromosomes, representing 98.7% and 80.47% of all contigs at the base and sequence number level, respectively. Approximately 260 Mb of sequence, accounting for ~39.8% of the genome, was identified as repetitive elements. DNA transposons (16.3%), long interspersed nuclear elements (12.4%) and long terminal repeats (11.0%) were the most repetitive types. In total, 24,372 protein‐coding genes were predicted in the genome, and ~95% of the genes were functionally annotated via a search in public databases. Using whole genome sequence information, we found that T. tibetana diverged from its common ancestor with Danio rerio ~121.4 million years ago. The high‐quality genome assembled in this work not only provides a valuable genomic resource for future population and conservation studies of T. tibetana, but it also lays a solid foundation for further investigation into the mechanisms of environmental adaptation of endemic fishes in the Tibetan Plateau. 相似文献
11.
复杂基因组测序技术研究进展 总被引:1,自引:0,他引:1
复杂基因组指的是无法使用常规测序和组装手段直接解析的一类基因组,通常指包含高比例重复序列、高杂合度、极端GC含量、存在难消除异源DNA污染的基因组。为了解决复杂基因组的测序和组装问题,需要分别从基因组测序实验方法、测序技术平台、组装算法与策略3个方面进行深入研究。本文详细介绍了复杂基因组测序组装相关的现有技术与方法,并结合复杂基因组经典实例介绍了复杂基因组测序的技术解决途径和发展历程,可为制订合适的复杂基因组测序策略提供参考。 相似文献
12.
Raju Chaudhary Chu Shin Koh Sampath Perumal Lingling Jin Erin E. Higgins Sateesh Kagale Mark A. Smith Andrew G. Sharpe Isobel A. P. Parkin 《Plant biotechnology journal》2023,21(3):521-535
Camelina neglecta is a diploid species from the genus Camelina, which includes the versatile oilseed Camelina sativa. These species are closely related to Arabidopsis thaliana and the economically important Brassica crop species, making this genus a useful platform to dissect traits of agronomic importance while providing a tool to study the evolution of polyploids. A highly contiguous chromosome-level genome sequence of C. neglecta with an N50 size of 29.1 Mb was generated utilizing Pacific Biosciences (PacBio, Menlo Park, CA) long-read sequencing followed by chromosome conformation phasing. Comparison of the genome with that of C. sativa shows remarkable coincidence with subgenome 1 of the hexaploid, with only one major chromosomal rearrangement separating the two. Synonymous substitution rate analysis of the predicted 34 061 genes suggested subgenome 1 of C. sativa directly descended from C. neglecta around 1.2 mya. Higher functional divergence of genes in the hexaploid as evidenced by the greater number of unique orthogroups, and differential composition of resistant gene analogs, might suggest an immediate adaptation strategy after genome merger. The absence of genome bias in gene fractionation among the subgenomes of C. sativa in comparison with C. neglecta, and the complete lack of fractionation of meiosis-specific genes attests to the neopolyploid status of C. sativa. The assembled genome will provide a tool to further study genome evolution processes in the Camelina genus and potentially allow for the identification and exploitation of novel variation for Camelina crop improvement. 相似文献
13.
Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community. 相似文献
14.
15.
DNA测序是生物信息学研究的重要内容之一,对测序序列的从头拼接是其中非常基础而重要的步骤.随着测序技术的不断更新,新的第三代测序数据拥有更长的序列长度、高错误率等性质,针对这些性质,同时使用二代、三代测序数据进行混合拼接是获得更好的拼接结果一种重要方式.本文介绍了现有的混合拼接软件的基本原理,并比较了不同软件拼接结果.... 相似文献
16.
Oxford Nanopore Technologies (ONT) is a third‐generation sequencing technology that is gaining popularity in ecological research for its portable and low‐cost sequencing possibilities. Although the technology excels at long‐read sequencing, it can also be applied to sequence amplicons. The downside of ONT is the low quality of the raw reads. Hence, generating a high‐quality consensus sequence is still a challenge. We present Amplicon_sorter, a tool for reference‐free sorting of ONT sequenced amplicons based on their similarity in sequence and length and for building solid consensus sequences. 相似文献
17.
18.
To gain genetic insights into the early-flowering phenotype of ornamental cherry, also known as sakura, we determined the genome sequences of two early-flowering cherry (Cerasus × kanzakura) varieties, ‘Kawazu-zakura’ and ‘Atami-zakura’. Because the two varieties are interspecific hybrids, likely derived from crosses between Cerasus campanulata (early-flowering species) and Cerasus speciosa, we employed the haplotype-resolved sequence assembly strategy. Genome sequence reads obtained from each variety by single-molecule real-time sequencing (SMRT) were split into two subsets, based on the genome sequence information of the two probable ancestors, and assembled to obtain haplotype-phased genome sequences. The resultant genome assembly of ‘Kawazu-zakura’ spanned 519.8 Mb with 1,544 contigs and an N50 value of 1,220.5 kb, while that of ‘Atami-zakura’ totalled 509.6 Mb with 2,180 contigs and an N50 value of 709.1 kb. A total of 72,702 and 69,528 potential protein-coding genes were predicted in the genome assemblies of ‘Kawazu-zakura’ and ‘Atami-zakura’, respectively. Gene clustering analysis identified 2,634 clusters uniquely presented in the C. campanulata haplotype sequences, which might contribute to its early-flowering phenotype. Genome sequences determined in this study provide fundamental information for elucidating the molecular and genetic mechanisms underlying the early-flowering phenotype of ornamental cherry tree varieties and their relatives. 相似文献
19.
Aleksey V Zimin Alaina Shumate Ida Shinder Jakob Heinz Daniela Puiu Mihaela Pertea Steven L Salzberg 《Genetics》2022,220(2)
Until 2019, the human genome was available in only one fully annotated version, GRCh38, which was the result of 18 years of continuous improvement and revision. Despite dramatic improvements in sequencing technology, no other genome was available as an annotated reference until 2019, when the genome of an Ashkenazi individual, Ash1, was released. In this study, we describe the assembly and annotation of a second individual genome, from a Puerto Rican individual whose DNA was collected as part of the Human Pangenome project. The new genome, called PR1, is the first true reference genome created from an individual of African descent. Due to recent improvements in both sequencing and assembly technology, and particularly to the use of the recently completed CHM13 human genome as a guide to assembly, PR1 is more complete and more contiguous than either GRCh38 or Ash1. Annotation revealed 37,755 genes (of which 19,999 are protein coding), including 12 additional gene copies that are present in PR1 and missing from CHM13. Fifty-seven genes have fewer copies in PR1 than in CHM13, 9 map only partially, and 3 genes (all noncoding) from CHM13 are entirely missing from PR1. 相似文献
20.
从DNA双螺旋结构的发现开始,生命科学研究进入分子水平,在20世纪70年代出现的测序技术为破译遗传密码作出了巨大贡献.近几年出现的单分子测序技术,可以在单个分子水平读取核苷酸序列,也被称为第三代测序技术,主要代表有HeliScope、Nanopore和PacBio等.与传统的第一代和第二代测序技术相比,第三代测序能够产生更长的碱基读长,能直接对RNA进行测序,无需逆转录,测序速度极快,同时其中某些技术所涉及的设备可以小型化,可便携至野外现场测序.第三代测序技术在生命科学基础理论研究及生物医学临床实践中,具有广泛的应用.本文重点介绍了各种单分子测序技术的原理、优缺点,及其应用研究进展. 相似文献