厚朴为著名的传统药用植物,归于木兰科、木兰属,于我国广泛种植,其树皮、根皮、枝皮、叶片、花、果实均能入药或食用.为获取厚朴全基因组序列信息,该文以厚朴叶片DNA为材料,采用Pacbio Sequel第三代测序技术构建厚朴全基因组数据库,并利用生物信息学方法对获得的核苷酸序列进行组装、功能注释以及进化分析研究.结果表明:...  相似文献   

随着新一代测序技术的发展,新的拼接算法应运而生。介绍了目前国际上广泛认可的几种新的拼接算法的基本原理与具体步骤,分析每种算法的优缺点以及适用范围。用Helicobacter acinonychis的Illumina 1G测序数据检测SSAKE,VCAKE,SHARCGS以及velvet的性能,并对未来拼接算法的研究提出展望。  相似文献   

文章阐述了以单分子实时测序和纳米孔技术为标志第三代测序的基本原理,介绍了Helicos的Heliscope单分子测序仪、Pacific Bioscience的SMRT技术和Oxford Nanopore Technologies公司正在研究的纳米孔单分子测序技术。与其他测序技术进行了简单的对比以并提出一些单分子测序仍需面对的问题以及对未来单分子测序的展望。  相似文献   

使用第二代测序数据来发现癌细胞中的基因组突变,一直是很重要的科学应用问题。此研究使用一个癌症病人的大量数据,评估了甄别基因组突变的几个现有工具。经过比较各工具的方法和正确率,本文发现各自都有自己的优点和缺点。针对这些优缺点,本文提供一些建议,让工具使用者能更好地选择合适的工具。  相似文献   

随着基因测序技术的创新和应用,新的高通量测序技术不断涌现,以Pacific Biosciences(PacBio)公司的单分子实时测序(single molecule real time sequencing)为代表的第三代测序(third generation sequencing,TGS)技术开始逐渐应用于基因组研究,包括大型基因组拼装、基因结构变异和表观遗传研究等方面。本文主要对TGS技术的原理、特点和应用,特别是在病毒研究中的应用进行介绍,并与第二代测序(next generation sequencing,NGS)技术进行比较,为基因组测序技术的选择及其临床应用提供一定参考。  相似文献   

新一代测序平台的诞生推动了对全基因组鸟枪法测序数据的拼接算法和软件的研究,自2005年以来多种用于高通量测序的序列拼接软件已经被开发出来,并且在不断地进行改进以提高拼接效果.本文利用目前广泛使用的高通量测序拼接软件Velvet、AbySS、SOAPdenovo和CLC Genomic Workbench分别对本试验室分离的一株噬菌体IME08的高通量测序结果进行拼接,介绍这几种拼接软件的安装使用及参数优化,并对不同软件的拼接结果进行比较,针对不同的拼接软件得到优化的拼接参数,可为其他研究人员使用上述软件提供参考借鉴.  相似文献   

第二代测序技术发展到一定阶段以后其缺陷逐渐显现,而第三代测序技术的出现在一定程度上弥补了第二代测序技术在应用方面的缺点.就目前正在发展的5种第三代高通量测序技术的原理进行了阐述,并比较了这几种测序技术的优缺点,最后对三代测序技术在基因组测序,甲基化研究,RNA测序以及医学方面的应用作了简单介绍.  相似文献   

很多的人类疾病与基因突变有关,基因突变在疾病的诊断和治疗中起到了至关重要的作用.第二代高通量测序,其特点为通量高、速度快、成本低,给检测基因突变带来了革命性的变化.该技术检测基因突变的流程简单,研究人员运用全基因组从测序,目标基因组测序以及转录组测序能够实现基因突变的全方位、高准确的检测.  相似文献   

随着基于第二代测序技术的细菌基因组与转录组研究越来越广泛,选择合适的研究策略变得越来越重要.就基于第二代测序技术的细菌基因组和转录组研究策略进行综述,并简要介绍细菌基因组和转录组研究中的机遇和挑战.综述细菌基因组与转录组研究的常规方法及步骤,并简要地介绍存在的问题.细菌基因组和转录组研究策略为大多数细菌的研究提供了一个...  相似文献   

近年来,植物全基因组测序的结果正如雨后春笋般涌现,木本植物全基因组测序也在紧锣密鼓地展开。但由于木本植物通常基因组较大,基因组结构较为复杂,在测序、测序后的组装、注释、功能分析等均存在较大的困难。在基因组测序分析的经费预算方面也存在着较大的压力。因此,有必要对这方面的研究进展及其存在问题进行分析比较,以提高林木全基因组研究方面的效率。文章在比较分析已经发展起来的3代基因测序技术(Sanger测序法、合成测序法和单分子测序法)的基础上,选择4种已经公布的木本植物(杨树、葡萄、番木瓜、苹果),从全基因组测序的研究背景、测序结果及应用的研究进展和存在问题等方面进行了述评,对未来要开展的木本植物全基因组测序前的准备工作(材料选择、遗传图谱和连锁图谱的构建、测序技术的选择),全基因组测序结果的生物信息学分析和应用进行了讨论。  相似文献   

Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community.  相似文献   

复杂基因组指的是无法使用常规测序和组装手段直接解析的一类基因组,通常指包含高比例重复序列、高杂合度、极端GC含量、存在难消除异源DNA污染的基因组。为了解决复杂基因组的测序和组装问题,需要分别从基因组测序实验方法、测序技术平台、组装算法与策略3个方面进行深入研究。本文详细介绍了复杂基因组测序组装相关的现有技术与方法,并结合复杂基因组经典实例介绍了复杂基因组测序的技术解决途径和发展历程,可为制订合适的复杂基因组测序策略提供参考。  相似文献   

During the last three decades, both genome mapping and sequencing methods have advanced significantly to provide a foundation for scientists to understand genome structures and functions in many species. Generally speaking, genome mapping relies on genome sequencing to provide basic materials, such as DNA probes and markers for their localizations, thus constructing the maps. On the other hand, genome sequencing often requires a high-resolution map as a skeleton for whole genome assembly. However, both genome mapping and sequencing have never come together in one pipeline. After reviewing mapping and next-generation sequencing methods, we would like to share our thoughts with the genome community on how to combine the HAPPY mapping technique with the new-generation sequencing, thus integrating two systems into one pipeline, called HAPPY pipeline. The pipeline starts with preparation of a HAPPY panel, followed by multiple displacement amplification for producing a relatively large quantity of DNA. Instead of conventional marker genotyping, the amplified panel DNA samples are subject to new-generation sequencing with barcode method, which allows us to determine the presence/absence of a sequence contig as a traditional marker in the HAPPY panel. Statistical analysis will then be performed to infer how close or how far away from each other these contigs are within a genome and order the whole genome sequence assembly as well. We believe that such a universal approach will play an important role in genome sequencing, mapping, and assembly of many species; thus advancing genome science and its applications in biomedicine and agriculture.  相似文献   

Annotated genomes can provide new perspectives on the biology of species. We present the first de novo whole genome sequencing for the pink-footed goose. In order to obtain a high-quality de novo assembly the strategy used was to combine one short insert paired-end library with two mate-pair libraries. The pink-footed goose genome was assembled de novo using three different assemblers and an assembly evaluation was subsequently performed in order to choose the best assembler. For our data, ALLPATHS-LG performed the best, since the assembly produced covers most of the genome, while introducing the fewest errors. A total of 26,134 genes were annotated, with bird species accounting for virtually all BLAST hits. We also estimated the substitution rate in the pink-footed goose, which can be of use in future demographic studies, by using a comparative approach with the genome of the chicken, the mallard and the swan goose. A substitution rate of 1.38 × 10? 7 per nucleotide per generation was obtained when comparing the genomes of the two closely-related goose species (the pink-footed and the swan goose). Altogether, we provide a valuable tool for future genomic studies aiming at particular genes and regions of the pink-footed goose genome as well as other bird species.  相似文献   

采用二代和三代测序技术分别对金针菇单核体菌株“6-3”进行测序,应用4种组装策略进行基因组的de novo组装,对比组装效果。基因组组装的参数方面,仅使用二代测序组装的效果最差,长度大于10kb的Contig全长只有24.6Mb,Contig N50只有23kb,组装率只有59.27%。采用三代组装二代校正的组装策略效果最好,长度大于10kb的Contig全长为38.3Mb,Contig N50为2.8Mb,组装率高达92.16%。保守单拷贝基因拼接效果方面,4种组装策略获得基因组序列与BUSCO数据库里的担子菌的保守单拷贝基因比对,基因完整性均大于94%。在组装准确性方面,经过PCR扩增、Sanger测序验证,三代组装二代校正的基因组序列完整并且连续,同时序列上碱基的SNP、InDel数量最少。综上所述,三代组装二代校正得到的基因组序列具有Contig N50值大、组装率高、碱基准确性高的特点,是食用菌基因组测序较为理想的方案。  相似文献   

Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled >96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. edena generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. ssake and vcake yielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while >90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 × deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.  相似文献   



Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.


We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.


From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

The leopard coral grouper, Plectropomus leopardus, belonging to the family Epinephelinae, is a carnivorous coral reef fish widely distributed in tropical and subtropical waters of the Indo‐Pacific. Due to its appealing body appearance and delicious taste, P. leopardus has become a popular commercial fish for aquaculture in many countries. However, the lack of genomic and molecular resources for P. leopardus has hindered study of its biology and genomic breeding programmes. Here we report the de novo sequencing and assembly of the P. leopardus genome using a combination of 10 × Genomics, high‐throughput chromosome conformation capture (Hi‐C) and PacBio long‐read sequencing technologies. The genome assembly has a total length of 881.55 Mb with a scaffold N50 of 34.15 Mb, consisting of 24 pseudochromosome scaffolds. busco analysis showed that 97.2% of the conserved single‐copy genes were retrieved, indicating the assembly was almost entire. We predicted 25,248 protein‐coding genes, among which 96.5% were functionally annotated. Comparative genomic analyses revealed that gene family expansions in P. leopardus were associated with immune‐related pathways. In addition, we identified 5,178,453 single nucleotide polymorphisms based on genome resequencing of 54 individuals. The P. leopardus genome and genomic variation data provide valuable genomic resources for studies of its genetics, evolution and biology. In particular, it is expected to benefit the development of genomic breeding programmes in the farming industry.  相似文献   

