首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 167 毫秒
1.
可变剪接源于多外显子基因生成多个转录本的调控过程。随着高通量测序,尤其是RNA-seq的研究进展,剪接序列和剪接位点可以通过挖掘海量的测序数据进行预测。可变剪接现象拓宽了人们对基因结构和蛋白质亚型的知识。然而现有的短序列比对软件受到随机性比对的影响,产生很多假阳性剪接位点,干扰下游数据分析。本研究发现,可变剪接位点周边序列的结构特征可被深度学习模型提取,并利用深度卷积神经网络识别剪接位点。本研究的模型具有识别率高、计算速度快,模型泛化能力强、鲁棒性高等优势。  相似文献   

2.
李鑫  李凯  李一佳  马磊 《生物信息学》2016,14(3):188-194
SeqMule可根据调用的人类基因组和外显子组数据自动调节变量,对所有测序数据的单核苷酸多态性(Single nucleotide polymorphism,SNP)进行分析和注释。目的:通过对两名痛风患者的实验数据进行分析,详细地为生物信息学研究人员介绍了SeqMule软件,以期为全基因组和外显子组测序数据提供一站式的分析途径。方法:基于SeqMule内置的BWA(BurrowsWheeler Aligner)、GATK(The Genome Analysis Toolkit)、SAMtools、Freebayes比对和分析工具,以两名痛风患者的DNA测序数据分析为例,本文详细地论述了SeqMule的特点及操作,并对两名患者的外显子测序数据进行了自动化比对与SNP分析。发现SeqMule优化了很多分析软件存在的一些问题,可以对外显子组和全基因组测序数据实现全面、灵活、高效地自动化分析,能更好地分析高通量测序数据,最终提升数据分析的一致性和准确性。  相似文献   

3.
目的:构建一个本地化的RNA-Seq数据处理分析平台,为RNA-Seq研究人员提供数据分析平台。方法:在调研现有的RNA-Seq数据分析研究成果的基础上,构建一套本地化的RNA-Seq分析平台,平台首先将测序数据中的低质量数据进行过滤,然后使用Top Hat将过滤后的数据与参考基因组数据进行比对,利用比对结果进行可变剪切分析、基因差异表达分析等,最后通过R语言工具包对分析结果进行可视化绘图。结果:通过对2组小鼠的RNA-Seq测序数据进行分析,构建的分析平台能够较好地过滤低质量测序数据,并且分析出2组数据间的差异表达基因,同时还可以图形化表示这些差异表达基因。结论:分析平台能够实现对RNA-Seq测序数据的质量控制、差异表达分析及分析结果的可视化。  相似文献   

4.
李文轲  李丰余  张思瑶  蔡斌  郑娜  聂宇  周到  赵倩 《遗传》2014,36(6):618-624
二代测序技术的发展对测序数据的处理分析提出了很高的要求。目前二代测序数据分析软件很多, 但是绝大多数软件仅能完成单一的分析功能(例如:仅进行序列比对或变异读取或功能注释等), 如何能正确高效地选择整合这些软件已成为迫切需求。文章设计了一套基于perl语言和SGE资源管理的自动化处理流程来分析Illumina平台基因组测序数据。该流程以测序原始序列数据作为输入, 调用业界标准的数据处理软件(如:BWA, Samtools, GATK, ANNOVAR等), 最终生成带有相应功能注释、便于研究者进一步分析的变异位点列表。该流程通过自动化并行脚本控制流程的高效运行, 一站式输出分析结果和报告, 简化了数据分析过程中的人工操作, 大大提高了运行效率。用户只需填写配置文件或使用图形界面输入即可完成全部操作。该工作为广大研究者分析二代测序数据提供了便利的途径。  相似文献   

5.
目的:大量研究证实线粒体DNA(mtDNA)突变与肿瘤发生及进展密切相关,但使用传统测序方法难以高通量、高精确度的检测mtDNA突变,为此本研究建立了基于新一代测序技术的mtDNA突变检测方法.方法:提取肝癌患者癌、癌旁组织以及外周血细胞总DNA,利用PCR技术对线粒体基因组进行富集并对PCR产物进行平末端、粘性末端连接或对PCR引物进行氨基修饰,构建mtDNA测序文库.经Illumina HiSeq 2000平台测序后利用生物信息学方法与人类mtDNA参考序列进行比对,并进行测序数据分析.结果:通过对不同质量基因组DNA进行评估后,发现三对引物法适用于大部分DNA样本的mtDNA富集.进一步我们发现PCR引物的氨基修饰可显著提高测序数据覆盖均一性,降低测序成本.结论:本研究利用新一代测序技术通过对线粒体DNA富集方法以及测序覆盖度均一性进行优化,建立了一套灵敏、特异、高通量的mtDNA突变检测策略,为mtDNA突变与疾病研究提供了新方法.  相似文献   

6.
RNA编辑是重要的转录后修饰过程,目前已有多种算法用于识别RNA编辑,本文主要研究小鼠中测序深度对RNA编辑识别算法的影响,从而为RNA编辑的研究给出建议的方法. 本文使用STAR比对软件将小鼠的RNA-seq数据进行序列比对,然后使用GATK识别SNV,并用Separate Method、GIREMI、RNAEditor 3种方法识别出RNA编辑位点. 最后对3种方法识别RNA编辑位点的共同部分、识别效率、识别稳定性、识别与测序深度的关系进行分析. 结果发现3种方法识别的编辑位点数目差异大,共有位点较少,随着测序深度的增加,识别的RNA编辑位点数也在增加. 结果表明RNA编辑识别算法在小鼠中的识别性能与测序深度呈正相关.  相似文献   

7.
随着高通量测序技术的快速发展,下一代测序技术也迅速发展为生物领域中的主流技术,而理解下一代测序数据最重要的一步是比对。比对是进行后续生物信息分析的基石,也因此催生了很多比对软件。本文主要选取了四种常用的比对软件Bowtie2、BWA、MAQ和SOAP2,对这四种软件及算法进行综述,并通过实际测序数据对四种软件进行比较和评估,为生物学研究者选择最佳的短序列比对软件提供理论和实践依据。  相似文献   

8.
目的:针对下一代测序数据,尤其是单端测序数据,研究快速、准确查找Indel的方法。方法:先与全基因组参考序列进行快速比对,筛选出包含Indel的序列;再对这些序列进行双向的二次比对,确定Indel长度;最后借助长度信息在锁定范围内查找Indel的确切位置和相关信息。结果:本文成功构建FIND(Fast INDel detection system)系统,用于从单端测序数据中查找Indel信息。以模拟测序数据作为测试数据,在12X测试数据情况下,FIND的灵敏度和特异性分别为87.71%和99.66%,而且该性能还随着测序倍数的增加而提升。结论:充分利用比对过程获取的信息,在确定Indle长度的同时也确定出其大致位置,最终在局部范围内实现对单端测序数据中Indle的快速而准确的查找。  相似文献   

9.
随着高通量测序技术的发展,全外显子测序已经成为一种研究人类疾病的重要方法.本文展示了一种通过Nimblegen2.1M芯片进行外显子DNA序列捕获和高通量测序的方法,包括两步法文库制备.测序的平均覆盖深度达33倍时,95.6%的34M目标区域得到均衡覆盖,特异性达到80%.对比全基因组鸟枪法测序的结果,此方法在检测SNP时的假阳性率为0.97%,假阴性率为6.27%.本方法对于全基因组扩增的DNA也适用.结果显示,全外显子测序技术可以在大规模的群体研究和医学研究中起到重要作用.  相似文献   

10.
随着高通量RNA测序(RNA-Seq)技术的发展和测序成本迅速下降,RNA-Seq技术已经成为生物学研究的重要工具,为生物学家全面地了解和研究转录组提供了机遇。高通量测序具有读长短、存在一定比例的测序错误、数据量大等特点,因此RNA-Seq数据分析与基因组分析和传统的EST数据分析有所不同。本文通过介绍不同的测序平台、原始数据产生和低质量数据过滤的计算流程,对短序列比对、转录组拼接、功能注释、以及差异表达分析进行了研究和分析,最后对RNA-Seq在昆虫学研究中的应用进行了综述,并对RNA-Seq技术进行了总结和展望。  相似文献   

11.
12.
High-throughput sequencing technologies are widely used to analyse genomic variants or rare mutational events in different fields of genomic research, with a fast development of new or adapted platforms and technologies, enabling amplicon-based analysis of single target genes or even whole genome sequencing within a short period of time. Each sequencing platform is characterized by well-defined types of errors, resulting from different steps in the sequencing workflow. Here we describe a universal method to prepare amplicon libraries that can be used for sequencing on different high-throughput sequencing platforms. We have sequenced distinct exons of the CREB binding protein (CREBBP) gene and analysed the output resulting from three major deep-sequencing platforms. platform-specific errors were adjusted according to the result of sequence analysis from the remaining platforms. Additionally, bioinformatic methods are described to determine platform dependent errors. Summarizing the results we present a platform-independent cost-efficient and timesaving method that can be used as an alternative to commercially available sample-preparation kits.  相似文献   

13.
Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.  相似文献   

14.
15.
基于PC/Linux的核酸序列分析系统的构建及其应用   总被引:13,自引:2,他引:11  
基于PC机和Linux操作系统, 利用Phred/Phrap/Consed软件和Blast软件, 构建了核酸序列大规模自动分析系统. 该套系统可自动完成从测序峰图向核酸序列的转化、载体序列去除、序列自动拼接、重复序列鉴定以及序列的相似性分析, 可加速对大规模测序数据的分析和利用.  相似文献   

16.
17.
18.
To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence coverage, which, as our study suggests, may affect some types of tandem repeats more than others.  相似文献   

19.
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.  相似文献   

20.
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Although there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn’t taken into account the sequencing errors when dealing with the duplicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号