首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
通过对美国华盛顿洲在2020年3至4月底爆发的新冠肺炎病人二代高通量测序数据分析,找出新冠病毒刺突糖蛋白(简称S朊)中存在的所有突变类型,为研究病毒在体内复制的突变规律及研究疫苗提供基础资料。利用NCBI中公布的130例美国华盛顿区报道的新冠肺炎病人二代高通量测序数据,进行序列组装,并对其朊编码基因进行深度突变分析,找出其潜在的疑似抗原变异位点(antigen variation,以下简称突变点)。排除30条未获全长的数据,共获得100份病人完整的SARS-CoV-2序列,其S朊编码基因,主要突变点集中在S1区的SP区及受体结合区(RBD)之前的间隔区(突变区基本呈连续分布:126aa~153aa, 194aa~204aa)和S2区的1250aa~1270aa区。100份样本的数据研究结果显示,新冠病毒S基因在病人体内复制过程中较为稳定,编码氨基酸的突变点(突变频率15%)呈单样本散在分布,且S2区较S1区更为稳定。未在新冠病毒的RBD区找到突变率20%的点,而S区散在零星突变区域主要集中在S1区间隔区(126aa~153aa, 194aa~204aa处,且基本呈连续分布)和S2末端约20aa处。  相似文献   

2.
使用第二代测序数据来发现癌细胞中的基因组突变,一直是很重要的科学应用问题。此研究使用一个癌症病人的大量数据,评估了甄别基因组突变的几个现有工具。经过比较各工具的方法和正确率,本文发现各自都有自己的优点和缺点。针对这些优缺点,本文提供一些建议,让工具使用者能更好地选择合适的工具。  相似文献   

3.
宋琳琳  顾朝辉  韦朝春  陈赛娟 《生物磁学》2009,(15):2899-2902,2912
目的:针对下一代测序数据量大、序列长度短的特点,研究数据分析和质量评估方法。方法:选择已发布的Illumina-Solexa平台测序数据为研究对象,通过MAQ软件将测序数据与人类全基因组序列进行比对,并以外显子区域为例,在位点水平对测序数据质量进行评估。结果:结合已有软件系统和本文自创线性算法,建立了一套包括比对、拼接在内的测序数据质量评估系统。比对分析后,发现原始测序序列共覆盖了127,113,378个位点,涉及24条染色体上的64868个外显子。其中,每个位点都被测到的外显子为0.50%,位点平均测序深度大于等于1的外显子为3.98%。结论:成功构建了基于Illumina-Solexa测序平台的数据分析和质量评估方法,其可适用于其它第二代测序平台。研究者可在质量评估的基础上完善测序试验设计,并进行SNP和突变筛选及后续功能性研究。  相似文献   

4.
结核病是严重的公共健康问题之一,而耐药结核病的增加是控制结核病流行的难点之一。快速、准确的诊断是提高结核患者治愈率和降低死亡率的关键因素。本研究建立了基于二代测序技术的扩增子测序方法,对5种一线抗结核药物的17个耐药基因进行检测。在26个临床耐药结核菌株中共鉴定出65个突变,包括33个热点突变,9个稀有突变和23个新突变。对18个新发现的错义突变进行了蛋白质序列保守性和蛋白质局部结构的分析。结果表明,14个新的错义突变在9种分枝杆菌中显示出高度保守性,并且导致了该蛋白质局部结构的改变。根据本研究检测和分析结果,推测这些新发现的突变可能是潜在的耐药突变。在本研究中,构建了扩增子测序的检测方法,可同时检测10株临床结核菌株的17个耐药基因,是一种快速、准确并且全面的检测耐药结核分枝杆菌一线治疗药物耐药突变的方法,该方法不仅能检测热点突变和稀有突变,还能发现一些未报道过的新突变。该检测方法或可用于临床诊断和基础研究。  相似文献   

5.
采用新一代高通量测序技术Illumina Solexa Hiseq 2500对发芽荞麦转录组进行测序,结合生物信息学方法开展基因表达谱研究和功能基因预测。通过测序,获得了42 953 962个序列读取片段(reads),包含了5.37 Gb碱基序列信息。对reads进行序列组装,获得45 278个单基因簇(unigenes),平均长度862 bp,序列信息达到了39 Mb。另外,从长度分布、GC含量、表达水平等方面对unigenes进行评估,数据显示测序质量好,可信度高。数据库中的序列同源性比较表明,2 127个unigenes与其他生物的己知基因具有不同程度的同源性。发芽苦荞转录组中的unigenes与细胞进程、细胞和蛋白结合相关。将unigenes与KOG数据库进行比对,根据其功能大致可分为24类。以KEGG数据库作为参考,依据代谢途径可将unigenes定位到328个代谢途径分支,包括核糖体代谢通路、碳水化合物代谢等,并且筛选出38条参与GABA合成的氧化磷酸化代谢的unigenes。SSR位点查找发现,从71 366个unigenes中共找到7 141个SSR位点。SSR不同重复基序类型中,出现频率最高的为A/T,其次是AAG/CTT和AT/AT。  相似文献   

6.
目的观察树鼩不同肠道部位菌群的多样性及构成。方法采集3只雄性树鼩回肠、盲肠、结肠内容物,提取DNA,利用Illumina PE250高通量测序平台扩增肠道菌16S rDNA V4区域,分析菌群结构和丰富度。结果树鼩回肠、盲肠、结肠菌群的优化序列数差异无统计学意义。α多样性分析,树鼩肠道3个部位菌群的Chao1指数、PD指数、Simpson指数、Shannon-Wiener指数差异无统计学意义,相对于回肠,盲肠与结肠菌群多样性的相似性较高。Rank-Abundance曲线显示,回肠菌群的丰富度较高且分布较均匀。β多样性分析,树鼩回肠菌群结构差异性较小,盲肠与结肠菌群结构差异较大。树鼩肠道菌群共检出26个门,17个门在3个组共存。互养菌门(Synergistetes)、Rokubacteria门、奇古菌门(Thaumarchaeota)、TA06门仅见于回肠;衣原体门(Chlamydiae)为盲肠中特有;迷踪菌门(Elusimicrobia)仅在结肠中发现。共获得414个属,结肠、盲肠、回肠中独有属分别为15个、7个、3个。共发现530个种,其中唾液乳杆菌(Lactobacillus salivarius)丰富度最高。Random Forest分析结果显示,在树鼩回肠、盲肠、结肠中发现7个生物标记物。结论树鼩回肠、盲肠、结肠肠道菌群多样性差异无显著性,但相对于结肠与盲肠,回肠菌群丰富度较高且分布较均匀。树鼩3个肠道部位具有各自独特的菌群。  相似文献   

7.
DNA甲基化作为一种表观遗传学修饰,在调控基因表达、X染色体失活、印记基因等方面都发挥着重要的作用.不同的DNA甲基化的预处理方法结合二代测序产生了大量的高通量甲基化数据,这些数据的存储、处理和分析是当前亟需解决的问题.在本文中,总结了目前存在的三种高通量DNA甲基化检测技术(限制性内切酶法,亲和纯化法,重亚硫酸盐转换法),以及针对这些技术产生的高通量数据开发的存储、处理和分析工具.另外,还注重介绍了单碱基水平的DNA甲基化检测技术,BS-Seq的测序原理、数据处理流程以及后续的分析工具.  相似文献   

8.
目的:大量研究证实线粒体DNA(mtDNA)突变与肿瘤发生及进展密切相关,但使用传统测序方法难以高通量、高精确度的检测mtDNA突变,为此本研究建立了基于新一代测序技术的mtDNA突变检测方法.方法:提取肝癌患者癌、癌旁组织以及外周血细胞总DNA,利用PCR技术对线粒体基因组进行富集并对PCR产物进行平末端、粘性末端连接或对PCR引物进行氨基修饰,构建mtDNA测序文库.经Illumina HiSeq 2000平台测序后利用生物信息学方法与人类mtDNA参考序列进行比对,并进行测序数据分析.结果:通过对不同质量基因组DNA进行评估后,发现三对引物法适用于大部分DNA样本的mtDNA富集.进一步我们发现PCR引物的氨基修饰可显著提高测序数据覆盖均一性,降低测序成本.结论:本研究利用新一代测序技术通过对线粒体DNA富集方法以及测序覆盖度均一性进行优化,建立了一套灵敏、特异、高通量的mtDNA突变检测策略,为mtDNA突变与疾病研究提供了新方法.  相似文献   

9.
基于高通量测序的全基因组关联研究策略   总被引:1,自引:0,他引:1  
周家蓬  裴智勇  陈禹保  陈润生 《遗传》2014,36(11):1099-1111
全基因组关联研究(Genome-wide association study, GWAS)是人类复杂疾病研究的重要组成部分之一,在群体水平检测全基因组范围的遗传变异与可观测性状间的遗传关联。传统的GWAS是以芯片(Array)技术获得高密度的遗传变异,尽管硕果累累,但也存在不少问题。如:所谓的“缺失的遗传力”,即利用关联分析检测达到全基因组水平显著的遗传变异位点只能解释小部分遗传力;在某些性状上不同研究的结果一致性较弱;显著关联的遗传变异位点的功能较难解释等。高通量测序技术,也称第二代测序(Next-generation sequencing, NGS)技术,可以快速、准确地产出高通量的变异位点数据,为解决以上问题提供了可行的方案。基于NGS技术的GWAS方法(NGS-GWAS),可在一定程度上弥补传统GWAS的不足。文章对NGS-GWAS策略和方法进行了系统性调研,提出了目前较为可行的NGS-GWAS的实施策略和方法,并对NGS-GWAS如何应用于个体化医疗(Personalized medicine, PM)进行了展望。  相似文献   

10.
韩伟  张庆珍  杨静  周喆 《遗传》2024,(4):306-318
近年来,法医实践中复杂案件数量逐渐增多,需要联合使用短串联重复序列(short tandem repeat,STR)、单核苷酸多态性(single nu cleotide p olymorphis ms,SNP)、插入缺失多态性(insert ion/deletion p olymorphism,InDel)、微单倍型(microhaplotype,MH)等不同类型的遗传标记,为案件提供更多的参考信息。本研究筛选了24个常染色体STR(autosomes STR,A-STR)、24个Y染色体STR(Y-STR)、110个A-SNP、24个Y-SNP、9个A-InDel、1个Y-InDel、8个MH和Amelo genin共201个遗传标记,建立二代测序检测体系HIDAM Panel v1.0。根据DNA分析方法科学工作组(Scientific Working Group on DNA Analy sis M ethods,SWGDAM)的验证指南,对该体系的重复性、准确性、灵敏度、对降解样本的适用性、物种特异性、抗抑制性等指标进行评估。本体系分型结果与基于毛细管电...  相似文献   

11.

Background

DNA-based methods like PCR efficiently identify and quantify the taxon composition of complex biological materials, but are limited to detecting species targeted by the choice of the primer assay. We show here how untargeted deep sequencing of foodstuff total genomic DNA, followed by bioinformatic analysis of sequence reads, facilitates highly accurate identification of species from all kingdoms of life, at the same time enabling quantitative measurement of the main ingredients and detection of unanticipated food components.

Results

Sequence data simulation and real-case Illumina sequencing of DNA from reference sausages composed of mammalian (pig, cow, horse, sheep) and avian (chicken, turkey) species are able to quantify material correctly at the 1% discrimination level via a read counting approach. An additional metagenomic step facilitates identification of traces from animal, plant and microbial DNA including unexpected species, which is prospectively important for the detection of allergens and pathogens.

Conclusions

Our data suggest that deep sequencing of total genomic DNA from samples of heterogeneous taxon composition promises to be a valuable screening tool for reference species identification and quantification in biosurveillance applications like food testing, potentially alleviating some of the problems in taxon representation and quantification associated with targeted PCR-based approaches.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-639) contains supplementary material, which is available to authorized users.  相似文献   

12.
Pseudogenes are frequently encountered noncoding sequences with a high sequence similarity to their protein-coding paralogue. For this reason, their presence is often considered troublesome in molecular diagnostics. In pseudoxanthoma elasticum(PXE), a disease predominantly caused by mutations in ATPbinding cassette family C member 6(ABCC6), the presence of two pseudogenes complicates the analysis of sequence data. With whole-exome sequencing(WES) becoming the standard of care in molecular diagnostics, we wanted to evaluate whether this technique is as reliable as gene-specific targeted enrichment analysis for the analysis of ABCC6. We established a PCR-based targeted enrichment and next-generation sequencing testing approach and demonstrated that the ABCC6-specific enrichment combined with the applied mapping algorithm overcomes the complication of ABCC6 pseudogene aspecificities, contrary to WES. We propose a time-and cost-efficient diagnostic strategy for comprehensive and accurate molecular genetic testing of PXE, which is highly automatable.  相似文献   

13.
Little is known about the inheritance of very low heteroplasmy mitochondria DNA (mtDNA) variations. Even with the development of new next-generation sequencing methods, the practical lower limit of measured heteroplasmy is still about 1% due to the inherent noise level of the sequencing. In this study, we sequenced the mitochondrial genome of 44 individuals using Illumina high-throughput sequencing technology and obtained high-coverage mitochondria sequencing data. Our study population contains many mother-offspring pairs. This unique study design allows us to bypass the usual heteroplasmy limitation by analyzing the correlation of mutation levels at each position in the mtDNA sequence between maternally related pairs and non-related pairs. The study showed that very low heteroplasmy variants, down to almost 0.1%, are inherited maternally and that this inheritance begins to decrease at about 0.5%, cor- resnondin to abottleneck of about 200 mtDNA.  相似文献   

14.

Background

Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.

Results

We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.

Conclusions

Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data.  相似文献   

15.
16.

Background

A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples.

Results

By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error.

Conclusion

SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-110) contains supplementary material, which is available to authorized users.  相似文献   

17.
Next-generation sequencing (NGS) is getting routinely used in the diagnosis of hereditary diseases, such as human cardiomyopathies. Hence, it is of utter importance to secure high quality sequencing data, enabling the identification of disease-relevant mutations or the conclusion of neg-ative test results. During the process of sample preparation, each protocol for target enrichment library preparation has its own requirements for quality control (QC); however, there is little evi-dence on the actual impact of these guidelines on resulting data quality. In this study, we analyzed the impact of QC during the diverse library preparation steps of Agilent SureSelect XT target enrichment and Illumina sequencing. We quantified the parameters for a cohort of around 600 sam-ples, which include starting amount of DNA, amount of sheared DNA, smallest and largest frag-ment size of the starting DNA; amount of DNA after the pre-PCR, and smallest and largest fragment size of the resulting DNA;as well as the amount of the final library, the corresponding smallest and largest fragment size, and the number of detected variants. Intriguingly, there is a high tolerance for variations in all QC steps, meaning that within the boundaries proposed in the current study, a considerable variance at each step of QC can be well tolerated without compromising NGS quality.  相似文献   

18.

Background

Human leukocyte antigen (HLA) is a group of genes that are extremely polymorphic among individuals and populations and have been associated with more than 100 different diseases and adverse drug effects. HLA typing is accordingly an important tool in clinical application, medical research, and population genetics. We have previously developed a phase-defined HLA gene sequencing method using MiSeq sequencing.

Results

Here we report a simple, high-throughput, and cost-effective sequencing method that includes normalized library preparation and adjustment of DNA molar concentration. We applied long-range PCR to amplify HLA-B for 96 samples followed by transposase-based library construction and multiplex sequencing with the MiSeq sequencer. After sequencing, we observed low variation in read percentages (0.2% to 1.55%) among the 96 demultiplexed samples. On this basis, all the samples were amenable to haplotype phasing using our phase-defined sequencing method. In our study, a sequencing depth of 800x was necessary and sufficient to achieve full phasing of HLA-B alleles with reliable assignment of the allelic sequence to the 8 digit level.

Conclusions

Our HLA sequencing method optimized for 96 multiplexing samples is highly time effective and cost effective and is especially suitable for automated multi-sample library preparation and sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-645) contains supplementary material, which is available to authorized users.  相似文献   

19.
This article reviews basic concepts,general applications,and the potential impact of next-generation sequencing(NGS)technologies on genomics,with particular reference to currently available and possible future platforms and bioinformatics.NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed,thereby enabling previously unimaginable scientific achievements and novel biological applications.But,the massive data produced by NGS also presents a significant challenge for data storage,analyses,and management solutions.Advanced bioinformatic tools are essential for the successful application of NGS technology.As evidenced throughout this review,NGS technologies will have a striking impact on genomic research and the entire biological field.With its ability to tackle the unsolved challenges unconquered by previous genomic technologies,NGS is likely to unravel the complexity of the human genome in terms of genetic variations,some of which may be confined to susceptible loci for some common human conditions.The impact of NGS technologies on genomics will be far reaching and likely change the field for years to come.  相似文献   

20.

Background

Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads.

Results

We have developed a general-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and correct insertion, deletion, and homopolymer errors while remaining sensitive to low coverage areas of sequencing projects. Using published data sets, we correct 94% of Illumina MiSeq errors, 88% of Ion Torrent PGM errors, 85% of Roche 454 GS Junior errors. Introduced errors are 20 to 70 times more rare than successfully corrected errors. Furthermore, we show that the quality of assemblies improves when reads are corrected by our software.

Conclusions

Pollux is highly effective at correcting errors across platforms, and is consistently able to perform as well or better than currently available error correction software. Pollux provides general-purpose error correction and may be used in applications with or without assembly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号