首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 135 毫秒
1.
王昊  陈挺 《生物信息学》2021,19(1):26-34
DNA测序是生物信息学研究的重要内容之一,对测序序列的从头拼接是其中非常基础而重要的步骤。随着测序技术的不断更新,新的第三代测序数据拥有更长的序列长度、高错误率等性质,针对这些性质,同时使用二代、三代测序数据进行混合拼接是获得更好的拼接结果一种重要方式。本文介绍了现有的混合拼接软件的基本原理,并比较了不同软件拼接结果。最后,本文对选择拼接软件以及提出新的混合拼接方法的研究方向给出了建议。  相似文献   

2.
四种常用高通量测序拼接软件的应用比较   总被引:1,自引:0,他引:1  
新一代测序平台的诞生推动了对全基因组鸟枪法测序数据的拼接算法和软件的研究,自2005年以来多种用于高通量测序的序列拼接软件已经被开发出来,并且在不断地进行改进以提高拼接效果.本文利用目前广泛使用的高通量测序拼接软件Velvet、AbySS、SOAPdenovo和CLC Genomic Workbench分别对本试验室分离的一株噬菌体IME08的高通量测序结果进行拼接,介绍这几种拼接软件的安装使用及参数优化,并对不同软件的拼接结果进行比较,针对不同的拼接软件得到优化的拼接参数,可为其他研究人员使用上述软件提供参考借鉴.  相似文献   

3.
生物序列拼接及其算法   总被引:1,自引:0,他引:1  
生物序列拼接是鸟枪法(shotgun)测序中的一个重要环节.主要介绍了生物序列拼接及其研究中所涉及的一些基本问题,概述了两类主要的生物序列拼接算法,分析了其各自的特点,并对其进行了比较.  相似文献   

4.
<正>新一代大规模平行测序技术能以相当低的单位成本下提高测序通量,但是大量DNA小片段的拼接组装极具挑战性,如果序列拼接这个关键步骤没有提高,整个基因组图谱绘制工作的速度就会被降低。华大基因研究院的汪建、王俊教授领衔的研究小组在基因组拼接技术方面取得新的突破,在最新一期的《Genome  相似文献   

5.
拷贝数变异是指基因组中发生大片段的DNA序列的拷贝数增加或者减少。根据现有的研究可知,拷贝数变异是多种人类疾病的成因,与其发生与发展机制密切相关。高通量测序技术的出现为拷贝数变异检测提供了技术支持,在人类疾病研究、临床诊疗等领域,高通量测序技术已经成为主流的拷贝数变异检测技术。虽然不断有新的基于高通量测序技术的算法和软件被人们开发出来,但是准确率仍然不理想。本文全面地综述基于高通量测序数据的拷贝数变异检测方法,包括基于reads深度的方法、基于双末端映射的方法、基于拆分read的方法、基于从头拼接的方法以及基于上述4种方法的组合方法,深入探讨了每类不同方法的原理,代表性的软件工具以及每类方法适用的数据以及优缺点等,并展望未来的发展方向。  相似文献   

6.
高通量测序技术的发展以及成本的降低使得细菌基因组测序成为研究细菌的标准流程。但细菌基因组变异度大,近缘基因组修正以及基因组从头拼接各有利弊,如何准确获得更多的有效功能基因尚无系统性的研究。本研究对真实环境中分离的一株短小芽孢杆菌(Bacillus pumilus)进行基因组测序,使用近缘基因组修正、基因组从头拼接、修正完剩下的reads再拼接(修正+拼接)这3种策略进行对比拼接,评价各策略的效能:近缘基因组修正获得原有标准基因组中已有的基因更准确;基因组从头拼接能获得大部分有效基因,但会引入大量的假阳性;修正+拼接策略可兼顾二者,但引入的假阳性也是最多的。分析还发现,注释到门以下的拼接结果可靠性高,有效减少拼接引入的假阳性。本研究为环境微生物研究提供策略指导,将促进环境微生物功能基因组的研究。  相似文献   

7.
书讯     
《生物技术通讯》2013,(1):40+52+70+90+103+108
基因测序实验技术化学工业出版社出版本书是关于基因测序技术的一部综合性著作。全面覆盖基因测序发展、RNA测序、DNA测序、基因组测序和拼接以及基因测序的应用等。本书不仅较详细地阐述了有关技术的具体操作  相似文献   

8.
新书介绍     
《生物产业技术》2012,(6):95-95
本书是关于基因测序技术的一部综合性著作。全面覆盖基因测序发展、RNA测序、DNA测序、基因组测序和拼接以及基因测序的应用等。本书不仅较详细地阐述了有关技术的具体操作和程序,更着力于对各种技术的基本原理及其相关理论基础进行深层次的剖析。  相似文献   

9.
《生命科学研究》2014,(5):458-464
高通量测序技术的飞速发展,给生物信息学带来了新的机遇和挑战,第二代测序序列数量多、长度短使得原来的序列分析手段不再适用。近几年来,针对高通量测序的序列分析算法和软件日益增多,目前已有上百种,导致选择合适的软件成为一个难题。对第二代测序的测序类型、序列类型以及分析算法进行了总结和归纳,对现今常用的分析软件的序列的类型、长度以及软件应用算法、输入/输出格式、特点和功能等方面做了详细分析和比较并给出建议。分析了现今测序技术和序列分析存在的问题,预测了今后的发展方向。  相似文献   

10.
新书介绍     
《生物产业技术》2013,(1):88-88
基因测序实验技术 本书是关于基因测序技术的一部综合性著作。全面覆盖基因测序发展、RNA测序、DNA测序、基因组测序和拼接以及基因测序的应用等。本书不仅较详细地阐述了有关技术的具体操作和程序,更着力于对各种技术的基本原理及其相关理论基础进行深层次的剖析。  相似文献   

11.
Since the completion of the cucumber and panda genome projects using Illumina sequencing in 2009, the global scientific community has had to pay much more attention to this new cost-effective approach to generate the draft sequence of large genomes. To allow new users to more easily understand the assembly algorithms and the optimum software packages for their projects, we make a detailed comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph, from how they match the Lander-Waterman model, to the required sequencing depth and reads length. We also discuss the computational efficiency of each class of algorithm, the influence of repeats and heterozygosity and points of note in the subsequent scaffold linkage and gap closure steps. We hope this review can help further promote the application of second-generation de novo sequencing, as well as aid the future development of assembly algorithms.  相似文献   

12.
State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another.  相似文献   

13.
Background: Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.Results: In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly.Conclusions: This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.  相似文献   

14.
Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems.  相似文献   

15.
For the last twenty years fragment assembly was dominated by the "overlap - layout - consensus" algorithms that are used in all currently available assembly tools. However, the limits of these algorithms are being tested in the era of genomic sequencing and it is not clear whether they are the best choice for large-scale assemblies. Although the "overlap - layout - consensus" approach proved to be useful in assembling clones, it faces difficulties in genomic assemblies: the existing algorithms make assembly errors even in bacterial genomes. We abandoned the "overlap - layout - consensus" approach in favour of a new Eulerian Superpath approach that outperforms the existing algorithms for genomic fragment assembly (Pevzner et al. 2001 InProceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB-01), 256-26). In this paper we describe our new EULER-DB algorithm that, similarly to the Celera assembler takes advantage of clone-end sequencing by using the double-barreled data. However, in contrast to the Celera assembler, EULER-DB does not mask repeats but uses them instead as a powerful tool for contig ordering. We also describe a new approach for the Copy Number Problem: "How many times a given repeat is present in the genome?". For long nearly-perfect repeats this question is notoriously difficult and some copies of such repeats may be "lost" in genomic assemblies. We describe our EULER-CN algorithm for the Copy Number Problem that proved to be successful in difficult sequencing projects.  相似文献   

16.
转录本组装是基于第二代测序技术研究转录组的关键环节,其质量好坏直接影响到下游结果的可靠性,也是目前的研究热点与难点。转录本组装方法可以分为Genome-guided和de novo两类,它们在理论基础与算法实现方面各有优劣。转录本组装质量的高低依赖于PCR扩增错误率、第二代测序技术准确率、组装算法和参考基因组完整性等方面,而现有的算法还无法完全处理由这些因素带来的影响。本文从转录本组装方法与软件、影响组装质量的因素和对组装质量的评价指标等方面进行讨论,以期能指导纯生物学家对分析软件的选择。  相似文献   

17.
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron''s Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.  相似文献   

18.
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.  相似文献   

19.
A case study in genome-level fragment assembly   总被引:2,自引:0,他引:2  
MOTIVATION: We use the fact of two teams independently sequencing the one megabase genome of Borrelia burgdorferi as an opportunity to study the accuracy of genome-level assembly. RESULTS: We compare the results of three different assembly programs (PHRAP, TIGR Assembler, and STROLL) on the DNA fragments used in both the Brookhaven and TIGR sequencing projects. We also describe the algorithms and data structures used in our assembly program STROLL, which was used in the Brookhaven Borrelia project.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号