共查询到20条相似文献,搜索用时 167 毫秒
1.
可变剪接源于多外显子基因生成多个转录本的调控过程。随着高通量测序,尤其是RNA-seq的研究进展,剪接序列和剪接位点可以通过挖掘海量的测序数据进行预测。可变剪接现象拓宽了人们对基因结构和蛋白质亚型的知识。然而现有的短序列比对软件受到随机性比对的影响,产生很多假阳性剪接位点,干扰下游数据分析。本研究发现,可变剪接位点周边序列的结构特征可被深度学习模型提取,并利用深度卷积神经网络识别剪接位点。本研究的模型具有识别率高、计算速度快,模型泛化能力强、鲁棒性高等优势。 相似文献
2.
SeqMule可根据调用的人类基因组和外显子组数据自动调节变量,对所有测序数据的单核苷酸多态性(Single nucleotide polymorphism,SNP)进行分析和注释。目的:通过对两名痛风患者的实验数据进行分析,详细地为生物信息学研究人员介绍了SeqMule软件,以期为全基因组和外显子组测序数据提供一站式的分析途径。方法:基于SeqMule内置的BWA(BurrowsWheeler Aligner)、GATK(The Genome Analysis Toolkit)、SAMtools、Freebayes比对和分析工具,以两名痛风患者的DNA测序数据分析为例,本文详细地论述了SeqMule的特点及操作,并对两名患者的外显子测序数据进行了自动化比对与SNP分析。发现SeqMule优化了很多分析软件存在的一些问题,可以对外显子组和全基因组测序数据实现全面、灵活、高效地自动化分析,能更好地分析高通量测序数据,最终提升数据分析的一致性和准确性。 相似文献
3.
目的:构建一个本地化的RNA-Seq数据处理分析平台,为RNA-Seq研究人员提供数据分析平台。方法:在调研现有的RNA-Seq数据分析研究成果的基础上,构建一套本地化的RNA-Seq分析平台,平台首先将测序数据中的低质量数据进行过滤,然后使用Top Hat将过滤后的数据与参考基因组数据进行比对,利用比对结果进行可变剪切分析、基因差异表达分析等,最后通过R语言工具包对分析结果进行可视化绘图。结果:通过对2组小鼠的RNA-Seq测序数据进行分析,构建的分析平台能够较好地过滤低质量测序数据,并且分析出2组数据间的差异表达基因,同时还可以图形化表示这些差异表达基因。结论:分析平台能够实现对RNA-Seq测序数据的质量控制、差异表达分析及分析结果的可视化。 相似文献
4.
二代测序技术的发展对测序数据的处理分析提出了很高的要求。目前二代测序数据分析软件很多, 但是绝大多数软件仅能完成单一的分析功能(例如:仅进行序列比对或变异读取或功能注释等), 如何能正确高效地选择整合这些软件已成为迫切需求。文章设计了一套基于perl语言和SGE资源管理的自动化处理流程来分析Illumina平台基因组测序数据。该流程以测序原始序列数据作为输入, 调用业界标准的数据处理软件(如:BWA, Samtools, GATK, ANNOVAR等), 最终生成带有相应功能注释、便于研究者进一步分析的变异位点列表。该流程通过自动化并行脚本控制流程的高效运行, 一站式输出分析结果和报告, 简化了数据分析过程中的人工操作, 大大提高了运行效率。用户只需填写配置文件或使用图形界面输入即可完成全部操作。该工作为广大研究者分析二代测序数据提供了便利的途径。 相似文献
5.
目的:大量研究证实线粒体DNA(mtDNA)突变与肿瘤发生及进展密切相关,但使用传统测序方法难以高通量、高精确度的检测mtDNA突变,为此本研究建立了基于新一代测序技术的mtDNA突变检测方法.方法:提取肝癌患者癌、癌旁组织以及外周血细胞总DNA,利用PCR技术对线粒体基因组进行富集并对PCR产物进行平末端、粘性末端连接或对PCR引物进行氨基修饰,构建mtDNA测序文库.经Illumina HiSeq 2000平台测序后利用生物信息学方法与人类mtDNA参考序列进行比对,并进行测序数据分析.结果:通过对不同质量基因组DNA进行评估后,发现三对引物法适用于大部分DNA样本的mtDNA富集.进一步我们发现PCR引物的氨基修饰可显著提高测序数据覆盖均一性,降低测序成本.结论:本研究利用新一代测序技术通过对线粒体DNA富集方法以及测序覆盖度均一性进行优化,建立了一套灵敏、特异、高通量的mtDNA突变检测策略,为mtDNA突变与疾病研究提供了新方法. 相似文献
6.
RNA编辑是重要的转录后修饰过程,目前已有多种算法用于识别RNA编辑,本文主要研究小鼠中测序深度对RNA编辑识别算法的影响,从而为RNA编辑的研究给出建议的方法. 本文使用STAR比对软件将小鼠的RNA-seq数据进行序列比对,然后使用GATK识别SNV,并用Separate Method、GIREMI、RNAEditor 3种方法识别出RNA编辑位点. 最后对3种方法识别RNA编辑位点的共同部分、识别效率、识别稳定性、识别与测序深度的关系进行分析. 结果发现3种方法识别的编辑位点数目差异大,共有位点较少,随着测序深度的增加,识别的RNA编辑位点数也在增加. 结果表明RNA编辑识别算法在小鼠中的识别性能与测序深度呈正相关. 相似文献
7.
8.
目的:针对下一代测序数据,尤其是单端测序数据,研究快速、准确查找Indel的方法。方法:先与全基因组参考序列进行快速比对,筛选出包含Indel的序列;再对这些序列进行双向的二次比对,确定Indel长度;最后借助长度信息在锁定范围内查找Indel的确切位置和相关信息。结果:本文成功构建FIND(Fast INDel detection system)系统,用于从单端测序数据中查找Indel信息。以模拟测序数据作为测试数据,在12X测试数据情况下,FIND的灵敏度和特异性分别为87.71%和99.66%,而且该性能还随着测序倍数的增加而提升。结论:充分利用比对过程获取的信息,在确定Indle长度的同时也确定出其大致位置,最终在局部范围内实现对单端测序数据中Indle的快速而准确的查找。 相似文献
9.
随着高通量测序技术的发展,全外显子测序已经成为一种研究人类疾病的重要方法.本文展示了一种通过Nimblegen2.1M芯片进行外显子DNA序列捕获和高通量测序的方法,包括两步法文库制备.测序的平均覆盖深度达33倍时,95.6%的34M目标区域得到均衡覆盖,特异性达到80%.对比全基因组鸟枪法测序的结果,此方法在检测SNP时的假阳性率为0.97%,假阴性率为6.27%.本方法对于全基因组扩增的DNA也适用.结果显示,全外显子测序技术可以在大规模的群体研究和医学研究中起到重要作用. 相似文献
10.
11.
12.
Marc W. Fuellgrabe Dietrich Herrmann Henrik Knecht Sven Kuenzel Michael Kneba Christiane Pott Monika Brüggemann 《PloS one》2015,10(6)
High-throughput sequencing technologies are widely used to analyse genomic variants or rare mutational events in different fields of genomic research, with a fast development of new or adapted platforms and technologies, enabling amplicon-based analysis of single target genes or even whole genome sequencing within a short period of time. Each sequencing platform is characterized by well-defined types of errors, resulting from different steps in the sequencing workflow. Here we describe a universal method to prepare amplicon libraries that can be used for sequencing on different high-throughput sequencing platforms. We have sequenced distinct exons of the CREB binding protein (CREBBP) gene and analysed the output resulting from three major deep-sequencing platforms. platform-specific errors were adjusted according to the result of sequence analysis from the remaining platforms. Additionally, bioinformatic methods are described to determine platform dependent errors. Summarizing the results we present a platform-independent cost-efficient and timesaving method that can be used as an alternative to commercially available sample-preparation kits. 相似文献
13.
Janine Meienberg Katja Zerjavic Irene Keller Michal Okoniewski Andrea Patrignani Katja Ludin Zhenyu Xu Beat Steinmann Thierry Carrel Benno R?thlisberger Ralph Schlapbach Rémy Bruggmann Gabor Matyas 《Nucleic acids research》2015,43(11):e76
Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants. 相似文献
14.
15.
16.
17.
18.
To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence coverage, which, as our study suggests, may affect some types of tandem repeats more than others. 相似文献
19.
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results. 相似文献
20.
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Although there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn’t taken into account the sequencing errors when dealing with the duplicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net/. 相似文献