共查询到20条相似文献,搜索用时 125 毫秒
1.
目的:针对斑马鱼高通量测序数据,通过一系列质量控制和突变过滤,构建一套有效隐性遗传突变筛选系统。方法:A)经过与斑马鱼参考基因组比对,利用GATK获得初始的VCF格式突变信息。B)使用perl语言编写本地脚本,对突变信息进行过滤,注释等。C)通过绘制纯合突变点分布图,找出突变富集区域。整合以上步骤,构建出一套针对斑马鱼高通量测序数据的遗传性突变筛选系统。结果:对获得的突变位点进行了一系列的质量控制,定位到具体染色体上的特定区域,并且对获得的突变进行了功能上的注释。结论:本过滤筛选系统能有效控制检测的假阳性率,对斑马鱼致病性的隐性突变位点筛查提供有力参考。 相似文献
2.
多重PCR甲基化靶向测序数据尚缺乏针对性的比对软件。本研究评估了9种比对方案在处理多重PCR甲基化靶向测序数据时的性能,包括平均CPU运行时间、平均最大内存、平均比对率、 F1分数、平均比对速率、比对未通过率和差异甲基化位点,以及比对率受亚硫酸氢盐转化率和测序错误率的影响。本研究建立了打分系统以综合评价比对方案的优劣,结果显示,排名前三的方案依次为Bismarkbwt2(8.098分)、 BWA-meth(7.846分)和Bismarkbwt1(7.840分)。这三个方案的F1分数均为1.000,且在不同亚硫酸氢盐转化率和测序错误率下的比对率表现最优。此外,Bismarkbwt2还对应最多的差异甲基化位点和最低的比对未通过率,并在平均最大内存和平均比对率两项指标上表现良好。因此,本研究推荐Bowtie2模式下的Bismark作为后续搭建多重PCR甲基化靶向测序生物信息学分析流程的比对软件。 相似文献
3.
4.
下一代测序中ChIP-seq数据的处理与分析 总被引:1,自引:0,他引:1
将染色质免疫共沉淀技术(ChIP)与下一代高通量测序技术相结合的染色质免疫共沉淀测序(ChIP-seq),已成为功能基因组学、特别是基因表达调控领域研究的关键技术。ChIP-seq实验带来的海量数据向生物信息学研究人员提出了新的挑战。由于此领域数据处理技术的发展大大滞后于实验技术进步,有必要系统地介绍和回顾ChIP-seq数据处理的各个方面,以便更多研究人员进入此领域设计或改进相应的算法。文章结合实例详细介绍了ChIP-seq数据整个流程,并重点讨论了其中的主要问题和关键环节,为这一研究领域的科研人员提供一个快速而深入的认识。 相似文献
5.
目的:针对下一代测序数据,尤其是单端测序数据,研究快速、准确查找Indel的方法。方法:先与全基因组参考序列进行快速比对,筛选出包含Indel的序列;再对这些序列进行双向的二次比对,确定Indel长度;最后借助长度信息在锁定范围内查找Indel的确切位置和相关信息。结果:本文成功构建FIND(Fast INDel detection system)系统,用于从单端测序数据中查找Indel信息。以模拟测序数据作为测试数据,在12X测试数据情况下,FIND的灵敏度和特异性分别为87.71%和99.66%,而且该性能还随着测序倍数的增加而提升。结论:充分利用比对过程获取的信息,在确定Indle长度的同时也确定出其大致位置,最终在局部范围内实现对单端测序数据中Indle的快速而准确的查找。 相似文献
6.
高通量测序技术及其应用 总被引:14,自引:0,他引:14
高通量测序技术是DNA测序发展历程的一个里程碑,它为现代生命科学研究提供了前所未有的机遇。详细介绍了以454、Solexa和SOLiD为代表的第二代高通量测序技术,以HeliScope TIRM和Pacific Biosciences SMRT为代表的单分子测序技术,以及最近Life Science公司推出的Ion Personal Genome Machine (PGM)测序技术等高通量测序技术的最新进展。在此基础上,阐述了高通量测序技术在基因组测序、转录组测序、基因表达调控、转录因子结合位点的检测以及甲基化等研究领域的应用。最后,讨论了高通量测序技术在成本和后续数据分析等方面存在的问题及其未来的发展前景。 相似文献
7.
8.
下一代测序技术目前已经应用于微生物、人类、动物、植物等的基因组分析.样品制备是开展大规模测序的必要前提和测序成功的根本保证.对大规模测序造成干扰的主要因素有: polyA干扰测序信号及高丰度基因对低丰度基因的掩盖等.文章以堇菜(Viola verecumda A.Gray)叶片为试材, 提取总RNA, 合成双链cDNA, 利用DSN核酸酶对双链cDNA进行均一化处理, 并对双链cDNA polyA进行了切除, 将处理后的cDNA进行了TA克隆, 挑取100个克隆随机测序.结果表明, 未处理的cDNA样本测序有15个克隆由于polyA的存在而影响了附近碱基的正确阅读, 独立克隆只有62个, 而处理后的cDNA样本经测序未发现polyA, 独立克隆有94个.序列分析发现, 经过处理的cDNA样本随机测序有两个克隆是经MALDI-TOF检测在样本中有蛋白质峰, 而基因克隆一直没有分离到的序列.以处理后的cDNA样本为模板扩增2个已知表达丰度差异较大的基因显示, 处理后这2个基因的PCR扩增产量差异明显减小.这些结果表明polyA切除、DSN核酸酶处理的cDNA样本完全满足大规模测序、寻找新基因的要求. 相似文献
9.
DNA甲基化异常可能导致拷贝数变异(copy number variants,CNVs)的发生,而CNVs的发生又可能改变DNA甲基化水平。全基因组亚硫酸氢盐测序(whole genome bisulfite sequencing,WGBS)技术能够获得DNA水平的测序数据,具有挖掘CNVs的潜力和优势,但利用WGBS数据挖掘CNVs的效果尚不清楚。本研究选取了5款检测CNVs不同策略的软件(BreakDancer、cn.mops、CNVnator、DELLY、Pindel),基于人类的真实(2.62 billion reads)和模拟(12.35 billion reads)测序数据,进行150次CNVs检测,评估CNVs检出数量、精确率、召回率、相对检出能力、内存占用和运行时间等指标,旨在讨论利用WGBS数据检测CNVs的最佳方案。基于真实WGBS数据,Pindel检出缺失型和重复型CNVs的数量最多,CNVnator对缺失型CNVs的检测精确率最高,cn.mops对重复型CNVs的检测精确率最高,Pindel对缺失型CNVs的召回率最高,cn.mops对重复型CNVs的召回率最高... 相似文献
10.
11.
Noncanonical microRNAs (miRNAs) and endogenous small interfering RNAs (endo-siRNAs) are distinct subclasses of small RNAs that bypass the DGCR8/DROSHA Microprocessor but still require DICER1 for their biogenesis. What role, if any, they have in mammals remains unknown. To identify potential functional properties for these subclasses, we compared the phenotypes resulting from conditional deletion of Dgcr8 versus Dicer1 in post-mitotic neurons. The loss of Dicer1 resulted in an earlier lethality, more severe structural abnormalities, and increased apoptosis relative to that from Dgcr8 loss. Deep sequencing of small RNAs from the hippocampus and cortex of the conditional knockouts and control littermates identified multiple noncanonical microRNAs that were expressed at high levels in the brain relative to other tissues, including mirtrons and H/ACA snoRNA-derived small RNAs. In contrast, we found no evidence for endo-siRNAs in the brain. Taken together, our findings provide evidence for a diverse population of highly expressed noncanonical miRNAs that together are likely to play important functional roles in post-mitotic neurons. 相似文献
12.
Qian Liu Qiang Hu Song Yao Marilyn L.Kwan Janise M.Roh Hua Zhao Christine B.Ambrosone Lawrence H.Kushi Song Liu Qianqian Zhu 《基因组蛋白质组与生物信息学报(英文版)》2019,17(2):211-218
As next-generation sequencing (NGS) technology has become widely used to identify genetic causal variants for various diseases and traits,a number of packages for checking NGS data quality have sprung up in public domains. In addition to the quality of sequencing data,sample quality issues,such as gender mismatch,abnormal inbreeding coefficient,cryptic relatedness,and population outliers,can also have fundamental impact on downstream analysis. However,there is a lack of tools specialized in identifying problematic samples from NGS data,often due to the limitation of sample size and variant counts. We developed SeqSQC,a Bioconductor package,to automate and accelerate sample cleaning in NGS data of any scale. SeqSQC is designed for efficient data storage and access,and equipped with interactive plots for intuitive data visualization to expedite the identification of problematic samples. SeqSQC is available at http://bioconductor. org/packages/SeqSQC. 相似文献
13.
14.
E. Gonzlez‐Tortuero J. Rusek A. Petrusek S. Gießler D. Lyras S. Grath F. Castro‐Monzn J. Wolinska 《Molecular ecology resources》2015,15(6):1385-1395
Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within‐population variation. Additionally, a public Illumina data set was used to validate the pipeline on community‐level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova ) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within‐population structure but also the successful application of the QRS pipeline on Illumina‐generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences . 相似文献
15.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance. 相似文献
16.
拷贝数变异是指基因组中发生大片段的DNA序列的拷贝数增加或者减少。根据现有的研究可知,拷贝数变异是多种人类疾病的成因,与其发生与发展机制密切相关。高通量测序技术的出现为拷贝数变异检测提供了技术支持,在人类疾病研究、临床诊疗等领域,高通量测序技术已经成为主流的拷贝数变异检测技术。虽然不断有新的基于高通量测序技术的算法和软件被人们开发出来,但是准确率仍然不理想。本文全面地综述基于高通量测序数据的拷贝数变异检测方法,包括基于reads深度的方法、基于双末端映射的方法、基于拆分read的方法、基于从头拼接的方法以及基于上述4种方法的组合方法,深入探讨了每类不同方法的原理,代表性的软件工具以及每类方法适用的数据以及优缺点等,并展望未来的发展方向。 相似文献
17.
18.
19.
目的将两个不同的二代测序平台及其生物信息学流程进行的相同标本的微生物群分析结果的异同进行比较。方法在同一实验室采用相同的生物信息学分析方案,使用两个不同的测序平台(Ion Torrent S5-xl和Illumina HiSeq 2500)对56个(28对)母婴粪便样本进行16S rRNA扩增子测序,采用相关性分析、主成分分析(PCA)、主坐标分析(PCoA)以及MRPP分析比较两个平台产生的微生物群落结构的异同。结果 Alpha多样性除Shannon指数外,Chao1指数(t=1.96,P=0.001 1)、Observed species指数(t=2.13,P0.001 0)、PD_whole tree指数(t=2.07,P0.001 0)、Simpson指数(t=1.87,P=0.003 1)和Good coverage指数(t=2.32,P0.001 0)差异存在统计学意义。对不同分类水平下的菌群相对丰度进行分析,测序和注释的细菌种类越多,两个平台间建立的细菌相对丰度的相关性越低。PCA结果显示,超过87%的样本被聚类。通过PCoA结果将两个平台的56个样本分为两个集群(cluster),两个平台之间的重叠率为71.43%。MRPP差异分别显示了两个测序平台的菌群数据在科水平、属水平上差异存在统计学意义(A=0.094 1,P=0.001 0;A=0.085 2,P=0.002 1)。当考虑样本来源时,在科水平上,母亲组的菌群组成没有明显的测序平台差异(A=0.006 3,P=0.149 1),新生儿组差异则有统计学意义(A=0.035 2,P=0.006 1);在属水平,母、婴组的菌群组成有显著的测序平台差异(A=0.021 6,P=0.004 2;A=0.098 1,P=0.001 0)。结论相同样本在不同平台进行扩增子测序,其菌群结构相对丰度基本相似,但其多样性和相关性仍然有很大差异。为了队列研究数据的可重复性和可靠性,建议使用相同的测序平台和分析流程以减少菌群分析中的偏倚。 相似文献
20.
目的:利用二代测序技术检测GT1-7细胞中KISS1和GnRH基因启动子范围内的甲基化状态,并用金标准的亚硫酸氢盐修饰后的克隆测序作为对照,比较二代测序与金标准克隆测序在研究DNA甲基化检测中的差别。方法:提取GT1-7细胞基因组DNA并进行亚硫酸氢盐处理。进行巢式PCR,将PCR产物进行二代测序。同时采用金标准的亚硫酸氢盐修饰后克隆测序的方法作为对照,对相同批次的PCR产物进行克隆测序。结果:PCR产物二代测序结果表明KISS1和GnRH两个基因的27个CpG甲基化位点信息完整,结果准确。挑取10个克隆进行一代测序结果表明序列无丢失,KISS1和GnRH两个基因的27个CpG甲基化位点信息完整。结论:利用高通量的二代测序技术能够有效的对DNA甲基化的PCR产物进行检测,二代测序和克隆测序都是研究DNA甲基化的有效方法,但前者与克隆测序相比每一个读取序列(reads)都相当于一个单克隆,且二代测序每个区段得到成百上千个reads,因此二代测序结果更加精确。 相似文献