首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
王昊  陈挺 《生物信息学》2021,19(1):26-34
DNA测序是生物信息学研究的重要内容之一,对测序序列的从头拼接是其中非常基础而重要的步骤.随着测序技术的不断更新,新的第三代测序数据拥有更长的序列长度、高错误率等性质,针对这些性质,同时使用二代、三代测序数据进行混合拼接是获得更好的拼接结果一种重要方式.本文介绍了现有的混合拼接软件的基本原理,并比较了不同软件拼接结果....  相似文献   

2.
王志明  潘元龙  吴俊  朱宝利 《微生物学报》2012,52(10):1219-1227
【目的】对卡介苗(Bacillus Calmette-Guerin,BCG)美国株(BCG Tice)进行基因组补缺口(补洞)工作,以得到它的基因组完整序列。【方法】首先对BCG Tice进行高通量测序,使用SOAPdenovo软件对得到的数据进行拼接。由于在高通量测序的过程中基因组某些区域测序覆盖度低,测序质量差会使测序结果经拼接后形成众多的重叠群(contig),相邻的位置关系确定的contig形成一个scaffold,contig之间未测到的区域为缺口序列(gap),在contig末端设计引物进行PCR扩增,得到连接相邻contig的PCR产物,对PCR产物进行测序。通过优化PCR引物设计策略,尝试不同的聚合酶进行聚合反应,调整PCR反应条件并结合PCR产物构建克隆测序等方法,补齐contig之间的缺口序列。【结果】完成了BCG Tice的全基因组测序,得到了它的基因组完整序列,序列已提交到美国国立生物技术信息中心(NCBI)的GenBank数据库。【结论】BCG属于高GC含量的革兰氏阳性细菌,其基因组GC含量高达65.65%。本文以BCG Tice基因组补洞为例,对高GC含量基因组补缺口过程中遇到的问题与采取的策略给予概述,望给相关高GC含量基因组的物种全基因组测序补缺口工作提供一些借鉴。  相似文献   

3.
多裂骆驼蓬为西北荒漠地区常见植物,具有抗风固沙、防止水土流失、抑菌杀虫和抗肿瘤等功效.为了增加骆驼蓬属植物开发利用的强度,弥补其基因功能、代谢通路等分子生物学研究层面的空缺,该文利用Illumina高通量测序平台对多裂骆驼蓬叶片进行转录组测序,根据测序结果进行转录组数据拼接、功能注释、序列水平和表达水平等分析.结果表明...  相似文献   

4.
随着高通量RNA测序(RNA-Seq)技术的发展和测序成本迅速下降,RNA-Seq技术已经成为生物学研究的重要工具,为生物学家全面地了解和研究转录组提供了机遇。高通量测序具有读长短、存在一定比例的测序错误、数据量大等特点,因此RNA-Seq数据分析与基因组分析和传统的EST数据分析有所不同。本文通过介绍不同的测序平台、原始数据产生和低质量数据过滤的计算流程,对短序列比对、转录组拼接、功能注释、以及差异表达分析进行了研究和分析,最后对RNA-Seq在昆虫学研究中的应用进行了综述,并对RNA-Seq技术进行了总结和展望。  相似文献   

5.
高通量测序技术的发展以及成本的降低使得细菌基因组测序成为研究细菌的标准流程。但细菌基因组变异度大,近缘基因组修正以及基因组从头拼接各有利弊,如何准确获得更多的有效功能基因尚无系统性的研究。本研究对真实环境中分离的一株短小芽孢杆菌(Bacillus pumilus)进行基因组测序,使用近缘基因组修正、基因组从头拼接、修正完剩下的reads再拼接(修正+拼接)这3种策略进行对比拼接,评价各策略的效能:近缘基因组修正获得原有标准基因组中已有的基因更准确;基因组从头拼接能获得大部分有效基因,但会引入大量的假阳性;修正+拼接策略可兼顾二者,但引入的假阳性也是最多的。分析还发现,注释到门以下的拼接结果可靠性高,有效减少拼接引入的假阳性。本研究为环境微生物研究提供策略指导,将促进环境微生物功能基因组的研究。  相似文献   

6.
【背景】随着测序费用的降低,越来越多的科学家选择利用高通量测序技术研究噬菌体的基因组序列。通过对这些基因组数据的分析和研究,一些科学家也开发出了判断dsDNA噬菌体末端序列的方法,但这些方法是基于Linux系统下的命令,并没有在Windows操作系统下的软件。【目的】在Windows平台下开发一款免费的、可以在高通量测序获得的庞大序列文件中找到dsDNA噬菌体基因组末端序列的软件PhageGT。【方法】使用Visual Studio 2019开发一个基于对话框的微软基础类库(Microsoft Foundation Classes,MFC)应用程序。软件使用C++语言开发,逐行读取序列文件中的每条Reads,并设计相应的算法进行统计、计算。【结果】软件PhageGT可在高通量测序文件中提取出不同序列出现的频率、排序,并利用提取序列的最高频率和序列平均频率的比值(R值)判断噬菌体基因组是否存在末端序列。【结论】软件PhageGT的使用比较方便、简单。软件PhageGT和本文所利用的所有测试数据均可从https://zenodo.org/record/4674231#.YHADb-gzZxc免费获得。  相似文献   

7.
目的:大量研究证实线粒体DNA(mtDNA)突变与肿瘤发生及进展密切相关,但使用传统测序方法难以高通量、高精确度的检测mtDNA突变,为此本研究建立了基于新一代测序技术的mtDNA突变检测方法.方法:提取肝癌患者癌、癌旁组织以及外周血细胞总DNA,利用PCR技术对线粒体基因组进行富集并对PCR产物进行平末端、粘性末端连接或对PCR引物进行氨基修饰,构建mtDNA测序文库.经Illumina HiSeq 2000平台测序后利用生物信息学方法与人类mtDNA参考序列进行比对,并进行测序数据分析.结果:通过对不同质量基因组DNA进行评估后,发现三对引物法适用于大部分DNA样本的mtDNA富集.进一步我们发现PCR引物的氨基修饰可显著提高测序数据覆盖均一性,降低测序成本.结论:本研究利用新一代测序技术通过对线粒体DNA富集方法以及测序覆盖度均一性进行优化,建立了一套灵敏、特异、高通量的mtDNA突变检测策略,为mtDNA突变与疾病研究提供了新方法.  相似文献   

8.
综述了高通量测序技术在线粒体全基因组测序中的策略,利用该技术对线粒体全基因组进行序列测定的方法可以归纳为两种,一种是先对目标mt DNA进行富集,包括mt DNA的提取纯化,目标区域PCR扩增法以及特异性探针杂交富集法(可分为基于微阵列和基于PCR探针的杂交富集法),然后对富集出的线粒体DNA进行高通量测序;另一种是先从待测样本的基因组高通量数据中挖掘出线粒体基因组序列信息,之后利用诱饵序列或者近缘物种的线粒体全基因组参考序列,使用软件MITObim对其进行组装。此外,还给出了线粒体高通量测序的优化流程图和介绍了混合样品的线粒体高通量测序策略。  相似文献   

9.
拷贝数变异是指基因组中发生大片段的DNA序列的拷贝数增加或者减少。根据现有的研究可知,拷贝数变异是多种人类疾病的成因,与其发生与发展机制密切相关。高通量测序技术的出现为拷贝数变异检测提供了技术支持,在人类疾病研究、临床诊疗等领域,高通量测序技术已经成为主流的拷贝数变异检测技术。虽然不断有新的基于高通量测序技术的算法和软件被人们开发出来,但是准确率仍然不理想。本文全面地综述基于高通量测序数据的拷贝数变异检测方法,包括基于reads深度的方法、基于双末端映射的方法、基于拆分read的方法、基于从头拼接的方法以及基于上述4种方法的组合方法,深入探讨了每类不同方法的原理,代表性的软件工具以及每类方法适用的数据以及优缺点等,并展望未来的发展方向。  相似文献   

10.
对鱼类早期生长阶段的摄食研究有助于了解其饵料来源及其在食物网中的功能地位,而全面准确地获取其食物种类信息是关键,高通量测序技术的发展给动物食性研究带来了前所未有的机遇和挑战.本研究以大亚湾人工码头海域金钱鱼稚鱼为对象,以18S rDNA为靶标,分别使用传统Sanger测序和Illumina Solexa高通量测序对其食物组成进行分析,比较两种方法在稚鱼摄食研究中的适用性.结果表明: 金钱鱼稚鱼为杂食性,食物多样性高,纤毛虫和苔藓动物是最优势的食物类群.使用传统测序方法共获得67条有效食物序列,分属于8个类群,涵盖23个生物种类;使用高通量测序方法共获得17000多条有效食物序列,分属于9个类群,涵盖35个生物种类.两种方法检测到的食物类群基本相同,但高通量测序方法在反映食物多样性和覆盖范围上更具优势,且灵敏度更高,检测出传统测序方法未发现的甲藻和褐藻种类,说明高通量测序技术可以较全面而准确地覆盖稚鱼的食物谱.高通量测序获取的大量数据,可在一定程度上提供半定量信息,克服传统测序在定量研究方面的不足.高通量测序技术在稚鱼摄食研究上优势更明显,食谱覆盖更广,检测灵敏度更高,显著提升了数据与结果的可信度,可为海洋生物摄食生态学研究提供强有力的支撑.  相似文献   

11.
12.
Even with the ubiquity of Sanger sequencing, automated assembly software are predominantly stand-alone software packages for desktop/laptop use with very few online equivalents, thus geospatially constraining sequence analysis and assembly. With increased data output worldwide, there is also a need for automated quality checks and trimming prior to large assemblies, along with automated detection of mutations. Through web servers with expanded automation and functionalities, even smartphones/phablets can be used to perform complex analysis previously limited to desktops, especially if they can upload files from cloud storage. To facilitate such online accessible sequence assembly and analysis, we created Yet Another Quick Assembly, Analysis and Trimming Tool web server for the automated assembly of multiple .ab1 and .FASTQ sequencing reads de novo with automated trimming and scanning of the assembled sequences for single nucleotide polymorphisms and insertions or deletions without installation of software, allowing it to be accessed from anywhere with Internet access and with minimal dependency on other software and web tools.  相似文献   

13.
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.  相似文献   

14.

Background  

Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages – Phred and Staden are used by preAssemble to perform sequence quality processing.  相似文献   

15.
ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. AVAILABILITY: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.  相似文献   

16.
17.
目的:研究和开发高通量测序全基因组组装过程中的填补gap的方法。方法:研究组装软件的算法,使用Perl语言编写自动填补gap的程序,并建立全基因组组装的流程。结果:提出了填补gap的末端延伸法,并使用Perl语言进行了编程;在对立克次体高通量测序的组装过程中,这些方法能大大减少gap的数量。结论:本研究提出的末端延伸法能够高效填补全序列组装过程中出现的gap,具有很强的实用性。  相似文献   

18.
MOTIVATION: During the process of high-throughput genome sequencing there are opportunities for mixups of reagents and data associated with particular projects. The sequencing templates or sequence data generated for an assembly may become contaminated with reagents or sequences from another project, resulting in poorer quality and inaccurate assemblies. RESULTS: We have developed a system to assess sequence assemblies and monitor for laboratory mixups. We describe several methods for testing the consistency of assemblies and resolving mixed ones. We use statistical tests to evaluate the distribution of sequencing reads from different plates into contigs, and a graph-based approach to resolve situations where data has been inappropriately combined. While these methods have been designed for use in a high-throughput DNA sequencing environment processing thousands of clones, they can be applied in any situation where distinct sequencing projects are performed at redundant coverage.  相似文献   

19.
棘孢小单胞菌(Micromonospora echinospora) ATCC 15837是一种高GC含量的革兰氏阳性稀有放线菌,能够合成烯二炔类抗肿瘤抗生素卡奇霉素(calicheamicin, CLM)。目前,还没有相关研究报道棘孢小单胞菌ATCC 15837的全基因组序列,这限制了其代谢产物合成途径和比较基因组学等研究。本研究首次通过高通量测序技术对棘孢小单胞菌ATCC 15837进行全基因组测序,使用相关生物信息学软件对数据进行组装和注释等分析。使用Velvet软件进行组装拼接得到77个Contigs,GC含量为72.36%,基因组大小约为7.69 Mb。序列已提交至美国国立生物技术信息中心(NCBI)的GenBank数据库(登录号为NGNT00000000)。本研究首次报道了一株烯二炔类抗肿瘤抗生素卡奇霉素产生菌棘孢小单胞菌ATCC 15837的全基因组序列,分析了基因组基本特征,预测了该菌株的次级代谢产物生物合成基因簇,为后续的进一步代谢调控与合成生物学提供了理论基础。  相似文献   

20.
Background: Next-generation sequencing (NGS) technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. However, numerous technical or computational challenges in de novo assembly still remain, although many new ideas and solutions have been suggested to tackle the challenges in both experimental and computational settings.Results: In this review, we first briefly introduce some of the major challenges faced by NGS sequence assembly. Then, we analyze the characteristics of various sequencing platforms and their impact on assembly results. After that, we classify de novo assemblers according to their frameworks (overlap graph-based, de Bruijn graph-based and string graph-based), and introduce the characteristics of each assembly tool and their adaptation scene. Next, we introduce in detail the solutions to the main challenges of de novo assembly of next generation sequencing data, single-cell sequencing data and single molecule sequencing data. At last, we discuss the application of SMS long reads in solving problems encountered in NGS assembly.Conclusions: This review not only gives an overview of the latest methods and developments in assembly algorithms, but also provides guidelines to determine the optimal assembly algorithm for a given input sequencing data type.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号