首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
微生物组数据分析需要掌握Linux系统操作,这对缺乏计算机知识的生物研究人员是一个很大的障碍。为此我们设计了一套在Windows的Linux子系统(WSL)下分析16S rRNA基因扩增子高通量测序数据的简易流程。本流程整合常用的开源软件VSEARCH与QIIME等,能对16S rRNA测序数据进行质量控制、OTU聚类、多样性分析及结果可视化呈现。以唾液微生物组分析为例,详细介绍从原始数据到多样性统计分析过程的参数和命令,及结果解读。教学实践证明,此流程易于学习,并有助于掌握微生物组的基本概念与方法。利用Windows系统最新的WSL功能,本流程方便Windows用户使用大量在Linux上运行的生物信息工具,有助于促进微生物组研究的发展。流程的安装程序与测序数据可从网址(http://www. ligene. cn/win16s/)免费下载使用。  相似文献   

2.
李文轲  李丰余  张思瑶  蔡斌  郑娜  聂宇  周到  赵倩 《遗传》2014,36(6):618-624
二代测序技术的发展对测序数据的处理分析提出了很高的要求。目前二代测序数据分析软件很多, 但是绝大多数软件仅能完成单一的分析功能(例如:仅进行序列比对或变异读取或功能注释等), 如何能正确高效地选择整合这些软件已成为迫切需求。文章设计了一套基于perl语言和SGE资源管理的自动化处理流程来分析Illumina平台基因组测序数据。该流程以测序原始序列数据作为输入, 调用业界标准的数据处理软件(如:BWA, Samtools, GATK, ANNOVAR等), 最终生成带有相应功能注释、便于研究者进一步分析的变异位点列表。该流程通过自动化并行脚本控制流程的高效运行, 一站式输出分析结果和报告, 简化了数据分析过程中的人工操作, 大大提高了运行效率。用户只需填写配置文件或使用图形界面输入即可完成全部操作。该工作为广大研究者分析二代测序数据提供了便利的途径。  相似文献   

3.
基于PC/Linux的核酸序列分析系统的构建及其应用   总被引:13,自引:2,他引:11  
基于PC机和Linux操作系统, 利用Phred/Phrap/Consed软件和Blast软件, 构建了核酸序列大规模自动分析系统. 该套系统可自动完成从测序峰图向核酸序列的转化、载体序列去除、序列自动拼接、重复序列鉴定以及序列的相似性分析, 可加速对大规模测序数据的分析和利用.  相似文献   

4.
随着宏基因组学技术的不断发展,以及测序成本的下降,基于宏基因组学的抗生素抗性基因(antibiotic resistance genes, ARGs)鉴定技术逐渐成为主流技术。本综述总结了基于宏基因组学的抗生素抗性基因鉴定的主要技术和方法,详细综述了各种鉴定技术的实现流程,并评述了各种方法的优缺点。本研究认为基于功能宏基因组学方法可以发现现有ARGs数据库没有记录的新的抗生素抗性基因。基于机器学习技术比基于AGRs数据库比对和搜索的软件有更高的敏感性和特异性。基于ARGs数据库搜索的方法优点是操作简单,对于生物信息学技能欠缺的研究者选择在线计算工具也是明智的选择。  相似文献   

5.
随着二代测序技术的快速发展,数据量不断累积,肿瘤学家的目光逐渐由多物种测序转移至高通量测序数据的分析和比对。基因数据分析方法层出不穷,高通量的组学分析手段不断优化和创新,基因数据的挖掘和分析工作正处于飞速发展的时期。以肿瘤病人样本为核心的数据库The Cancer Genome Atlas (TCGA)由此应运而生,该数据库全方位记录了从临床肿瘤病人样本得到的基因数据如DNA序列、转录本信息、表观遗传学修饰等。本文主要从数据分析方法、TCGA数据库及其应用实例等3个方面详细介绍了肿瘤相关基因数据的深入挖掘和生物信息学分析方法的最新研究进展,以期为研究人员利用大数据发现肿瘤防治相关的新靶点提供借鉴和参考。  相似文献   

6.
润肺圣药川贝母资源匮乏,为国家三级保护植物.近年雾霾天气频发使得其资源需求加大,导致以川贝母入药的中成药掺假现象严重.中成药成分复杂,单利用普通的DNA条形码鉴定流程无法对其一次性完成鉴定,因此需结合单克隆分析逐一鉴定.本研究首先建立了包含贝母属208条ITS2序列的数据库;收集了市售20份川贝中成药,其中包含15份蛇胆川贝胶囊,4份川贝末胶囊和1份蛇胆川贝散.扩增其ITS2序列,对PCR产物进行克隆测序,对测序结果进行BLAST比对;同时本实验还选取了3份蛇胆川贝胶囊进行高通量测序,利用单克隆和二代测序相结合的混合测序方法对该3份中成药中川贝母再次鉴定.基于二代测序数据对中成药组分解析存在数据冗余现象,而单克隆结果与选取的克隆数量有关,二者结合可相互弥补缺陷.单克隆结果数据表明,15份蛇胆川贝胶囊中3份含有平贝母,10份含有伊贝母及14份含有黄花贝母等非标签成分,4份川贝末胶囊中有2份检测到川贝母,2份检测到平贝母,但同时3份中还检测到黄花贝母;1份蛇胆川贝散中仅检测到伊贝母与黄花贝母.二代测序结果表明,3份蛇胆川贝胶囊中主要检测出平贝母、伊贝母等非标签成分.通过比对二者序列,发现无论克隆还是二代测序数据均表明混合测序的3份蛇胆川贝胶囊中均不含川贝母;本研究表明,基于单克隆和二代测序辅助的方法可以准确对川贝母中成药进行鉴定.目前川贝母中成药掺假现象严重,平贝母和伊贝母是川贝母中成药的主要掺伪品,应加强对中成药市场的监管.  相似文献   

7.
16S rRNA测序技术在肠道微生物中的应用研究进展   总被引:3,自引:0,他引:3  
16S rRNA测序是高通量测序依赖的肠道微生物研究方法之一,该方法可以对肠道微生物中的所有菌种进行精确定量,因此正逐渐成为研究肠道微生物菌种丰度变化的主流。肠道微生物16S rRNA测序的应用过程中有两个问题至关重要,一是如何根据需要选择测序方案;二是面对高通量测序得到的海量数据,如何进行生物信息学分析,以得到具有生物学意义的结果。从测序平台、测序片段、测序数据量的选择3个方面讨论了如何选择测序方案,并从序列聚类与注释、群落结构分析、关键分类单位的筛选与功能分析等方面对目前常用的生物信息学分析手段进行综述。  相似文献   

8.
高通量RNA测序(RNA-seq)技术为研究人员提供了海量数据,如何对这些数据进行快速有效的分析,并为后续转录组、基因表达等研究提供支持,是生物信息学领域的热点方向。本文讨论了当前RNA-seq数据分析的发展水平和常用软件、算法,并设计了一系列数据处理模块和分析流程。同时,为了给用户提供更好的使用环境,我们设计了基于弹性资源管理系统的生物云平台BioCloud。该平台集成了丰富的软件,采用高灵活度、高扩展性的体系架构,在给用户提供低成本、高性能计算服务的同时,还提供个性化的流程定制服务。  相似文献   

9.
基于高通量测序技术的微生物检测数据分析方法   总被引:1,自引:0,他引:1       下载免费PDF全文
高通量测序技术的发展正在逐渐改变诸多生物学领域的研究方法.为应对突发疫情以及新发未知微生物威胁的需求,微生物鉴定技术逐渐从传统的物理化学方法及核酸杂交等分子水平方法进一步走向利用无需培养的测序数据进行快速分析检测.随之而来的是对高通量数据分析在精度及速度的要求.基于高通量测序数据的微生物检测数据分析方法在近些年得到了快速的发展.本文分析了目前基于高通量测序数据的微生物检测数据分析方法,对其数据分析的处理流程和计算方法进行了研究,比较了各个微生物检测数据分析方法的特点及适用场景.最后结合本实验室工作总结微生物检测数据分析方法在实际应用中可能遇到的问题,希望对该应用领域的研究有一定的参考意义.  相似文献   

10.
尽管二代基因组测序技术日渐流行,Sanger测序依旧是SNP识别和分析的金标准。传统对于Sanger测序结果的分析多依赖Seq Man等软件进行。然而这类软件大多依靠人工操作来识别和记录测序结果中的SNP位点,效率低下且容易发生错误。此外,当对多个个体进行序列测定时,这类软件无法完成对群体数据的管理和输出,给研究人员造成了一定的不便。Phred/Phrap/Consed/Polyphred是华盛顿大学开发的基于类Unix平台的软件包,在大规模测序数据的管理和SNP自动识别、标记与输出方面具有强大的功能。然而,由于其安装和使用较为复杂,在国内较少使用。本研究对该软件包的功能、使用流程、特点等进行了介绍,并将其安装于Ubuntu12.04操作系统并置于VMware虚拟机中,方便遗传学者的下载和使用。  相似文献   

11.
Next‐generation sequencing (NGS) technology is revolutionizing the fields of population genetics, molecular ecology and conservation biology. But it can be challenging for researchers to learn the new and rapidly evolving techniques required to use NGS data. A recent workshop entitled ‘Population Genomic Data Analysis’ was held to provide training in conceptual and practical aspects of data production and analysis for population genomics, with an emphasis on NGS data analysis. This workshop brought together 16 instructors who were experts in the field of population genomics and 31 student participants. Instructors provided helpful and often entertaining advice regarding how to choose and use a NGS method for a given research question, and regarding critical aspects of NGS data production and analysis such as library preparation, filtering to remove sequencing errors and outlier loci, and genotype calling. In addition, instructors provided general advice about how to approach population genomics data analysis and how to build a career in science. The overarching messages of the workshop were that NGS data analysis should be approached with a keen understanding of the theoretical models underlying the analyses, and with analyses tailored to each research question and project. When analysed carefully, NGS data provide extremely powerful tools for answering crucial questions in disciplines ranging from evolution and ecology to conservation and agriculture, including questions that could not be answered prior to the development of NGS technology.  相似文献   

12.
Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html.  相似文献   

13.
14.
The development of next-generation sequencing (NGS) presents an unprecedented opportunity to investigate the complex microbial communities that are associated with the human body. It offers for the first time a basis for detailed temporal and spatial analysis, with the potential to revolutionize our understanding of many clinically important systems. However, while advances continue to be made in areas such as PCR amplification for NGS, sequencing protocols, and data analysis, in many cases the quality of the data generated is undermined by a failure to address fundamental aspects of experimental design. While little is added in terms of time or cost by the analysis of repeat samples, the exclusion of DNA from dead bacterial cells and the extracellular matrix, the use of efficient nucleic acid extraction methodologies, and the implementation of safeguards to minimize the introduction of contaminating nucleic acids, such considerations are essential in achieving an accurate representation of the system being studied. In this review, the chronic bacterial infections that characterize lower respiratory tract infections in cystic fibrosis patients are used as an example system to examine the implications of a failure to address these issues when designing NGS-based analysis of human-associated microbiota. Further, ways in which the impact of these factors can be minimized are discussed.  相似文献   

15.
SUMMARY: Characterizing genetic diversity through genotyping short amplicons is central to evolutionary biology. Next-generation sequencing (NGS) technologies changed the scale at which these type of data are acquired. SESAME is a web application package that assists genotyping of multiplexed individuals for several markers based on NGS amplicon sequencing. It automatically assigns reads to loci and individuals, corrects reads if standard samples are available and provides an intuitive graphical user interface (GUI) for allele validation based on the sequences and associated decision-making tools. The aim of SESAME is to help allele identification among a large number of sequences. AVAILABILITY: SESAME and its documentation are freely available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence for Windows and Linux from http://www1.montpellier.inra.fr/CBGP/NGS/ or http://tinyurl.com/ngs-sesame.  相似文献   

16.
Next-generation sequencing (NGS) has caused a revolution in biology. NGS requires the preparation of libraries in which (fragments of) DNA or RNA molecules are fused with adapters followed by PCR amplification and sequencing. It is evident that robust library preparation methods that produce a representative, non-biased source of nucleic acid material from the genome under investigation are of crucial importance. Nevertheless, it has become clear that NGS libraries for all types of applications contain biases that compromise the quality of NGS datasets and can lead to their erroneous interpretation. A detailed knowledge of the nature of these biases will be essential for a careful interpretation of NGS data on the one hand and will help to find ways to improve library quality or to develop bioinformatics tools to compensate for the bias on the other hand. In this review we discuss the literature on bias in the most common NGS library preparation protocols, both for DNA sequencing (DNA-seq) as well as for RNA sequencing (RNA-seq). Strikingly, almost all steps of the various protocols have been reported to introduce bias, especially in the case of RNA-seq, which is technically more challenging than DNA-seq. For each type of bias we discuss methods for improvement with a view to providing some useful advice to the researcher who wishes to convert any kind of raw nucleic acid into an NGS library.  相似文献   

17.
Next-generation sequencing (NGS) has transformed molecular biology and contributed to many seminal insights into genomic regulation and function. Apart from whole-genome sequencing, an NGS workflow involves alignment of the sequencing reads to the genome of study, after which the resulting alignments can be used for downstream analyses. However, alignment is complicated by the repetitive sequences; many reads align to more than one genomic locus, with 15–30% of the genome not being uniquely mappable by short-read NGS. This problem is typically addressed by discarding reads that do not uniquely map to the genome, but this practice can lead to systematic distortion of the data. Previous studies that developed methods for handling ambiguously mapped reads were often of limited applicability or were computationally intensive, hindering their broader usage. In this work, we present SmartMap: an algorithm that augments industry-standard aligners to enable usage of ambiguously mapped reads by assigning weights to each alignment with Bayesian analysis of the read distribution and alignment quality. SmartMap is computationally efficient, utilizing far fewer weighting iterations than previously thought necessary to process alignments and, as such, analyzing more than a billion alignments of NGS reads in approximately one hour on a desktop PC. By applying SmartMap to peak-type NGS data, including MNase-seq, ChIP-seq, and ATAC-seq in three organisms, we can increase read depth by up to 53% and increase the mapped proportion of the genome by up to 18% compared to analyses utilizing only uniquely mapped reads. We further show that SmartMap enables the analysis of more than 140,000 repetitive elements that could not be analyzed by traditional ChIP-seq workflows, and we utilize this method to gain insight into the epigenetic regulation of different classes of repetitive elements. These data emphasize both the dangers of discarding ambiguously mapped reads and their power for driving biological discovery.  相似文献   

18.
19.
Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within‐population variation. Additionally, a public Illumina data set was used to validate the pipeline on community‐level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova ) revealed that population structure of Cmesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within‐population structure but also the successful application of the QRS pipeline on Illumina‐generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences .  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号