首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
宋琳琳  顾朝辉  韦朝春  陈赛娟 《生物磁学》2009,(15):2899-2902,2912
目的:针对下一代测序数据量大、序列长度短的特点,研究数据分析和质量评估方法。方法:选择已发布的Illumina-Solexa平台测序数据为研究对象,通过MAQ软件将测序数据与人类全基因组序列进行比对,并以外显子区域为例,在位点水平对测序数据质量进行评估。结果:结合已有软件系统和本文自创线性算法,建立了一套包括比对、拼接在内的测序数据质量评估系统。比对分析后,发现原始测序序列共覆盖了127,113,378个位点,涉及24条染色体上的64868个外显子。其中,每个位点都被测到的外显子为0.50%,位点平均测序深度大于等于1的外显子为3.98%。结论:成功构建了基于Illumina-Solexa测序平台的数据分析和质量评估方法,其可适用于其它第二代测序平台。研究者可在质量评估的基础上完善测序试验设计,并进行SNP和突变筛选及后续功能性研究。  相似文献   

2.
<正>简要综述了序列读取后对基因组的定位和比对后的分析进展,结合RNA测序、miRNA测序、捕获测序探讨了基因组数据分析及其应用。下一代测序(NGS)技术正在极大地改变生物学家进行研究的方式。不同的NGS平台如Illumina Solexa、Life Tech SOLiD和Roche 454提供了前所未有的大规模并行测序能力,短时间内对生物样  相似文献   

3.
多重PCR甲基化靶向测序数据尚缺乏针对性的比对软件。本研究评估了9种比对方案在处理多重PCR甲基化靶向测序数据时的性能,包括平均CPU运行时间、平均最大内存、平均比对率、 F1分数、平均比对速率、比对未通过率和差异甲基化位点,以及比对率受亚硫酸氢盐转化率和测序错误率的影响。本研究建立了打分系统以综合评价比对方案的优劣,结果显示,排名前三的方案依次为Bismarkbwt2(8.098分)、 BWA-meth(7.846分)和Bismarkbwt1(7.840分)。这三个方案的F1分数均为1.000,且在不同亚硫酸氢盐转化率和测序错误率下的比对率表现最优。此外,Bismarkbwt2还对应最多的差异甲基化位点和最低的比对未通过率,并在平均最大内存和平均比对率两项指标上表现良好。因此,本研究推荐Bowtie2模式下的Bismark作为后续搭建多重PCR甲基化靶向测序生物信息学分析流程的比对软件。  相似文献   

4.
目的:针对下一代测序数据,尤其是单端测序数据,研究快速、准确查找Indel的方法。方法:先与全基因组参考序列进行快速比对,筛选出包含Indel的序列;再对这些序列进行双向的二次比对,确定Indel长度;最后借助长度信息在锁定范围内查找Indel的确切位置和相关信息。结果:本文成功构建FIND(Fast INDel detection system)系统,用于从单端测序数据中查找Indel信息。以模拟测序数据作为测试数据,在12X测试数据情况下,FIND的灵敏度和特异性分别为87.71%和99.66%,而且该性能还随着测序倍数的增加而提升。结论:充分利用比对过程获取的信息,在确定Indle长度的同时也确定出其大致位置,最终在局部范围内实现对单端测序数据中Indle的快速而准确的查找。  相似文献   

5.
李鑫  李凯  李一佳  马磊 《生物信息学》2016,14(3):188-194
SeqMule可根据调用的人类基因组和外显子组数据自动调节变量,对所有测序数据的单核苷酸多态性(Single nucleotide polymorphism,SNP)进行分析和注释。目的:通过对两名痛风患者的实验数据进行分析,详细地为生物信息学研究人员介绍了SeqMule软件,以期为全基因组和外显子组测序数据提供一站式的分析途径。方法:基于SeqMule内置的BWA(BurrowsWheeler Aligner)、GATK(The Genome Analysis Toolkit)、SAMtools、Freebayes比对和分析工具,以两名痛风患者的DNA测序数据分析为例,本文详细地论述了SeqMule的特点及操作,并对两名患者的外显子测序数据进行了自动化比对与SNP分析。发现SeqMule优化了很多分析软件存在的一些问题,可以对外显子组和全基因组测序数据实现全面、灵活、高效地自动化分析,能更好地分析高通量测序数据,最终提升数据分析的一致性和准确性。  相似文献   

6.
单分子实时测序技术的原理与应用   总被引:1,自引:0,他引:1  
柳延虎  王璐  于黎 《遗传》2015,37(3):259-268
单分子DNA测序技术是近10年发展起来的新一代测序技术,也称为第三代测序技术,包括单分子实时测序、真正单分子测序、单分子纳米孔测序等技术。文章介绍了单分子实时(Single-molecule real-time,SMRT)测序技术的基本原理、性能以及应用。与Sanger测序法和下一代测序技术相比,SMRT测序具有超长读长、测序周期短、无需模板扩增和直接检测表观修饰位点等特点,为研究人员提供了新选择。同时,SMRT测序的低准确率备受争议(约85%),其中约93%的错误是插入缺失,因此,其数据应用于基因组组装前需先对数据进行纠错处理。目前,SMRT测序在小型基因组从头测序和完整组装中已有良好应用,并且已经或将在表观遗传学、转录组学、大型基因组组装等领域发挥其优势,促进基因组学的研究。  相似文献   

7.
李文轲  李丰余  张思瑶  蔡斌  郑娜  聂宇  周到  赵倩 《遗传》2014,36(6):618-624
二代测序技术的发展对测序数据的处理分析提出了很高的要求。目前二代测序数据分析软件很多, 但是绝大多数软件仅能完成单一的分析功能(例如:仅进行序列比对或变异读取或功能注释等), 如何能正确高效地选择整合这些软件已成为迫切需求。文章设计了一套基于perl语言和SGE资源管理的自动化处理流程来分析Illumina平台基因组测序数据。该流程以测序原始序列数据作为输入, 调用业界标准的数据处理软件(如:BWA, Samtools, GATK, ANNOVAR等), 最终生成带有相应功能注释、便于研究者进一步分析的变异位点列表。该流程通过自动化并行脚本控制流程的高效运行, 一站式输出分析结果和报告, 简化了数据分析过程中的人工操作, 大大提高了运行效率。用户只需填写配置文件或使用图形界面输入即可完成全部操作。该工作为广大研究者分析二代测序数据提供了便利的途径。  相似文献   

8.
目的 联合采用表达谱芯片和下一代测序技术同时高通量筛选先天性心脏病胎儿心肌组织表达差异的miRNA.方法 实验组为孕中期先天性畸形胎儿,对照组为同胎龄无心脏畸形的难免流产的胎儿,取胎儿心室心肌组织,联合采用Agilent Human 2.0 microRNAs表达谱芯片和SOLiD下一代测序技术同时观察心肌组织microRNA的表达变化,数据采用生物信息学方法进行分析,并用实时PCR方法验证芯片结果.结果 通过差异miRNA筛选,发现先天性心脏畸形组在表达谱芯片和下一代测序中共同差异的24个miRNA,生物信息学预测到1 606个靶基因,靶基因Gene Ontology分析表明其中与细胞进程、代谢过程、生物调控相关的靶基因为主,Pathway显著性分析表明,部分靶基因为生物信号通路中的关键因子;随机挑选共同表达差异的4个miRNA进行验证,结果表明定量PCR检测结果与芯片与下一代测序共同筛选结果基本相符.结论 这些在先天性心脏病中异常表达的miRNA为研究先天性心脏病分子水平上的发病机制提供了重要的线索,将有可能为心脏相关疾病的诊断和治疗提供新的靶点和研发新的药物.  相似文献   

9.
第三代单分子测序是一种通量高、速度快、读长长的新型测序技术,但是和二代测序技术相比,这种测序方法因错误率较高而限制了其在各方面的应用。为了能有效的降低错误率、提高组装质量,本研究以拟南芥作为研究对象,通过数据模拟等生信手段比较评估了不同软件对三代测序的纠错效率以及在不同测序深度下的混合组装效果,以期找到最佳混合组装效果的测序深度及混装软件。通过比较四种混合校正软件PBc R、Lo RDEC、Jabba、Proovread对二代和三代数据的矫正效果。我们发现,Lo RDEC以最快的速度将核酸准确性从85%提高到99%,为评估软件中最佳选择。在Lo RDEC矫正结果的基础上,为了最大限度的节省测序成本,我们进一步分析了不同测序深度的二代(25×~100×)和三代(20×~50×)混合组装效果,以期找到成本最低、混装效果最好的测序策略。结果表明,混合组装软件DBG2OLC和三代组装软件canu能够产生更长的组装序列和更好的基因完整性。  相似文献   

10.
彭焕文  王伟 《植物学报》2023,(2):261-273
系统发生学是研究生物类群间进化关系的学科。随着测序技术、分析方法和计算能力的改进,分子数据被广泛应用,促进了系统发生学的快速发展。系统发生树已成为生态学和比较生物学等研究领域的有力工具。然而,许多研究在进行系统发生树构建时更侧重各种软件的使用,一些基本原则或注意事项有时会被弱化甚至忽视。该文详细介绍了基于分子数据进行系统发生树构建的工作流程和基本方法,包括类群取样、分子标记选择、序列比对、分区及模型选择、序列联合分析以及拓扑结构检验等关键步骤。此外,该文还为系统发生树构建常用的3种方法(最大简约法、最大似然法和贝叶斯法)提供了相应的软件操作流程和运行命令,以期为相关研究提供参考。  相似文献   

11.
12.
ABSTRACT: BACKGROUND: Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. RESULTS: To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools.Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. CONCLUSIONS: This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis.The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat.  相似文献   

13.
Multiple sequence alignments are powerful tools for understanding the structures, functions, and evolutionary histories of linear biological macromolecules (DNA, RNA, and proteins), and for finding homologs in sequence databases. We address several ontological issues related to RNA sequence alignments that are informed by structure. Multiple sequence alignments are usually shown as two-dimensional (2D) matrices, with rows representing individual sequences, and columns identifying nucleotides from different sequences that correspond structurally, functionally, and/or evolutionarily. However, the requirement that sequences and structures correspond nucleotide-by-nucleotide is unrealistic and hinders representation of important biological relationships. High-throughput sequencing efforts are also rapidly making 2D alignments unmanageable because of vertical and horizontal expansion as more sequences are added. Solving the shortcomings of traditional RNA sequence alignments requires explicit annotation of the meaning of each relationship within the alignment. We introduce the notion of “correspondence,” which is an equivalence relation between RNA elements in sets of sequences as the basis of an RNA alignment ontology. The purpose of this ontology is twofold: first, to enable the development of new representations of RNA data and of software tools that resolve the expansion problems with current RNA sequence alignments, and second, to facilitate the integration of sequence data with secondary and three-dimensional structural information, as well as other experimental information, to create simultaneously more accurate and more exploitable RNA alignments.  相似文献   

14.
15.
We aim to compare the performance of Bowtie2 , bwa‐mem , blastn and blastx when aligning bacterial metagenomes against the Comprehensive Antibiotic Resistance Database (CARD). Simulated reads were used to evaluate the performance of each aligner under the following four performance criteria: correctly mapped, false positives, multi‐reads and partials. The optimal alignment approach was applied to samples from two wastewater treatment plants to detect antibiotic resistance genes using next generation sequencing. blastn mapped with greater accuracy among the four sequence alignment approaches considered followed by Bowtie2 . blastx generated the greatest number of false positives and multi‐reads when aligned against the CARD. The performance of each alignment tool was also investigated using error‐free reads. Although each aligner mapped a greater number of error‐free reads as compared to Illumina‐error reads, in general, the introduction of sequencing errors had little effect on alignment results when aligning against the CARD. Given each performance criteria, blastn was found to be the most favourable alignment tool and was therefore used to assess resistance genes in sewage samples. Beta‐lactam and aminoglycoside were found to be the most abundant classes of antibiotic resistance genes in each sample.

Significance and Impact of the Study

Antibiotic resistance genes (ARGs) are pollutants known to persist in wastewater treatment plants among other environments, thus methods for detecting these genes have become increasingly relevant. Next generation sequencing has brought about a host of sequence alignment tools that provide a comprehensive look into antimicrobial resistance in environmental samples. However, standardizing practices in ARG metagenomic studies is challenging since results produced from alignment tools can vary significantly. Our study provides sequence alignment results of synthetic, and authentic bacterial metagenomes mapped against an ARG database using multiple alignment tools, and the best practice for detecting ARGs in environmental samples.  相似文献   

16.
Protein sequence alignment has become an essential task in modern molecular biology research. A number of alignment techniques have been documented in literature and their corresponding tools are made available as freeware and commercial software. The choice and use of these tools for sequence alignment through the complete interpretation of alignment results is often considered non-trivial by end-users with limited skill in Bioinformatics algorithm development. Here, we discuss the comparison of sequence alignment techniques based on dynamic programming (N-W, S-W) and heuristics (LFASTA, BL2SEQ) for four sets of sequence data towards an educational purpose. The analysis suggests that heuristics based methods are faster than dynamic programming methods in alignment speed.  相似文献   

17.
Z Sun  W Tian 《PloS one》2012,7(8):e42887
The third-generation of sequencing technologies produces sequence reads of 1000 bp or more that may contain high polymorphism information. However, most currently available sequence analysis tools are developed specifically for analyzing short sequence reads. While the traditional Smith-Waterman (SW) algorithm can be used to map long sequence reads, its naive implementation is computationally infeasible. We have developed a new Sequence mapping and Analyzing Program (SAP) that implements a modified version of SW to speed up the alignment process. In benchmarks with simulated and real exon sequencing data and a real E. coli genome sequence data generated by the third-generation sequencing technologies, SAP outperforms currently available tools for mapping short and long sequence reads in both speed and proportion of captured reads. In addition, it achieves high accuracy in detecting SNPs and InDels in the simulated data. SAP is available at https://github.com/davidsun/SAP.  相似文献   

18.
The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets.  相似文献   

19.
20.
随着大规模技术的进步,收录到数据库中的序列很快,其中大多是未知功能的ESTs(表达序列标签,Expressed Sequence Tags),一般通过蛋白南-EST序列联配来实验EST的功能提示。由于EST含有5%左右的误差,特别严重的是其中的移框误差,用通常的方法将EST按6个框翻译为蛋白南序列再进行联配难以处理移框误差问题。通过考虑EST序列各种可能的误差,将氨基酸序列反翻译为核苷酸序列,在核  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号