首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
宋琳琳  顾朝辉  韦朝春  陈赛娟 《生物磁学》2009,(15):2899-2902,2912
目的:针对下一代测序数据量大、序列长度短的特点,研究数据分析和质量评估方法。方法:选择已发布的Illumina-Solexa平台测序数据为研究对象,通过MAQ软件将测序数据与人类全基因组序列进行比对,并以外显子区域为例,在位点水平对测序数据质量进行评估。结果:结合已有软件系统和本文自创线性算法,建立了一套包括比对、拼接在内的测序数据质量评估系统。比对分析后,发现原始测序序列共覆盖了127,113,378个位点,涉及24条染色体上的64868个外显子。其中,每个位点都被测到的外显子为0.50%,位点平均测序深度大于等于1的外显子为3.98%。结论:成功构建了基于Illumina-Solexa测序平台的数据分析和质量评估方法,其可适用于其它第二代测序平台。研究者可在质量评估的基础上完善测序试验设计,并进行SNP和突变筛选及后续功能性研究。  相似文献   

2.
根据鼠伤寒沙门氏菌的特异序列,分别设计扩增引物和测序引物,建立焦磷酸测序检测鼠伤寒沙门氏菌的方法。针对鼠伤寒沙门氏菌设计特异性扩增引物,对目标片段进行PCR扩增,然后制备单链模板,并利用测序引物进行焦磷酸测序。测序结果表明,6株不同来源的鼠伤寒沙门氏菌均可以扩增出碱基序列为TACAACCGGA GTGCACATTA ATCCCGCAGC的基因片段,而30株阴性对照菌株均未得到扩增。进行BLAST比对表明,该序列与GenBank中鼠伤寒沙门氏菌的碱基序列100%匹配。焦磷酸测序法是一种快速、准确的检测方法,可用于食品中鼠伤寒沙门氏菌的快速检测。  相似文献   

3.
黄方亮 《生物信息学》2015,13(2):116-119
为了探索加快细菌基因组研究的方法,利用ABI PGM测序平台测定了1株单细胞硫还原地杆菌的基因组序列。测序共获得1.4 Gbp数据,平均读长为177 bp。通过多个拼接软件并采用合适的组装策略,得到一个完整细菌基因组3.55 Mbp和一条完整质粒序列110 kbp。测定基因组序列与参考基因组kn400序列的相似性达到94%,参考基因组91%的基因能在测定基因组中找到相似基因。通过本研究表明采用ABI PGM测序平台结合灵活的拼接策略可快速构建细菌基因组精细图谱,为进一步的功能注释及深入的信息分析提供准确的数据,大大加快研究进程。  相似文献   

4.
随着高通量测序技术的快速发展,下一代测序技术也迅速发展为生物领域中的主流技术,而理解下一代测序数据最重要的一步是比对。比对是进行后续生物信息分析的基石,也因此催生了很多比对软件。本文主要选取了四种常用的比对软件Bowtie2、BWA、MAQ和SOAP2,对这四种软件及算法进行综述,并通过实际测序数据对四种软件进行比较和评估,为生物学研究者选择最佳的短序列比对软件提供理论和实践依据。  相似文献   

5.
杂草稻nrDNAITS序列的直接测序和克隆测序比较结果表明直接测序和克隆测序均能得到准确的杂草稻nrD-NAITS序列。尽管直接测序方便和经济,但它的峰图差,有叠峰出现。杂草稻的ITS序列在587-589bp之间,其中ITS1长度为193-194bp,ITS2的长度为230-232bp,而5.8S均为164bp。与已公布的部分稻属的ITS进行多重比较,该序列蕴含了大量的系统学信息,可在分子水平为杂草稻的发生机理提供证据。  相似文献   

6.
采用新一代高通量测序技术Illumina Solexa Hiseq 2500对发芽荞麦转录组进行测序,结合生物信息学方法开展基因表达谱研究和功能基因预测。通过测序,获得了42 953 962个序列读取片段(reads),包含了5.37 Gb碱基序列信息。对reads进行序列组装,获得45 278个单基因簇(unigenes),平均长度862 bp,序列信息达到了39 Mb。另外,从长度分布、GC含量、表达水平等方面对unigenes进行评估,数据显示测序质量好,可信度高。数据库中的序列同源性比较表明,2 127个unigenes与其他生物的己知基因具有不同程度的同源性。发芽苦荞转录组中的unigenes与细胞进程、细胞和蛋白结合相关。将unigenes与KOG数据库进行比对,根据其功能大致可分为24类。以KEGG数据库作为参考,依据代谢途径可将unigenes定位到328个代谢途径分支,包括核糖体代谢通路、碳水化合物代谢等,并且筛选出38条参与GABA合成的氧化磷酸化代谢的unigenes。SSR位点查找发现,从71 366个unigenes中共找到7 141个SSR位点。SSR不同重复基序类型中,出现频率最高的为A/T,其次是AAG/CTT和AT/AT。  相似文献   

7.
采用二代和三代测序技术分别对金针菇单核体菌株“6-3”进行测序,应用4种组装策略进行基因组的de novo组装,对比组装效果。基因组组装的参数方面,仅使用二代测序组装的效果最差,长度大于10kb的Contig全长只有24.6Mb,Contig N50只有23kb,组装率只有59.27%。采用三代组装二代校正的组装策略效果最好,长度大于10kb的Contig全长为38.3Mb,Contig N50为2.8Mb,组装率高达92.16%。保守单拷贝基因拼接效果方面,4种组装策略获得基因组序列与BUSCO数据库里的担子菌的保守单拷贝基因比对,基因完整性均大于94%。在组装准确性方面,经过PCR扩增、Sanger测序验证,三代组装二代校正的基因组序列完整并且连续,同时序列上碱基的SNP、InDel数量最少。综上所述,三代组装二代校正得到的基因组序列具有Contig N50值大、组装率高、碱基准确性高的特点,是食用菌基因组测序较为理想的方案。  相似文献   

8.
采用Illumina HiSeqTM2500高通量测序技术,获得泡桐维管形成层及木质部区组织转录组的109021918条clean reads(16.35 Gb)。将clean reads从头组装得到104432个单基因簇(Unigene),平均长度662 nt。将组装得到的Unigenes与公共数据库进行序列比对,分别有40789(Nr:39.05%)、31675(NT:30.33%)、15539(COG:14.87%)、29168(GO:27.93%)、16316(KEGG:15.62%)、30499(SwissProt:29.20%)以及28828(Pfam:27.6%)个Unigenes获得功能注释。通过与GO数据库的比对分析,注释的29168个Unigenes归于生物过程、细胞组分及分子功能三大类的55个功能组;15539条Unigenes注释到COG数据库,被分为25个类别中;基于KEGG数据库可将16316个Unigenes归于130个代谢途径。此外,在泡桐的维管形成层及木质部区转录组中共检测出16118个简单序列重复(SSR)位点。为进一步挖掘泡桐重要功能基因提供了大量数据。  相似文献   

9.
香瓜茄又名人参果,具有抗氧化、抗肿瘤、抗糖尿病等多种生物活性。为丰富茄科作物基因组信息及进化发育历程,获取香瓜茄全基因组序列信息,同时为香瓜茄相关分子研究奠定基础。以香瓜茄植物组织为试验材料,基于Illumina HiSeq构建小片段文库进行基因组特征评估,利用PacBio三代测序技术、Hi-C技术构建及组装香瓜茄全基因组数据库。利用生物信息学方法对获得的基因组序列进行组装、功能注释以及进化分析研究。结果表明,获得54.11 Gb Illumina HiSeq数据;获得55.08 Gb PacBio数据,reads平均长度为14 179 bp;获得Hi-C数据量约143 Gb;拼接得到该基因组contig序列总长为1.16 Gb,Hi-C纠错后contig N50为22.63 Mb;Hi-C挂载染色体,共有1.12 Gb长度的序列可以挂载到12条染色体上,占比97.16%;其中,能够确定顺序和方向的序列长度为1.08 Gb,占定位染色体序列总长度的96.11%,得到基因组大小1.25 Gb;预测有64.22%的重复序列,41 571个基因,99.06%的基因可以注释到NR、GO、KEGG等数据库中;预测得到4 360个tRNA、5 677个rRNA、154个miRNA;得到449个假基因。香瓜茄与马铃薯的进化时间大约在12.82 MYA。  相似文献   

10.
目的:利用二代测序技术检测GT1-7细胞中KISS1和GnRH基因启动子范围内的甲基化状态,并用金标准的亚硫酸氢盐修饰后的克隆测序作为对照,比较二代测序与金标准克隆测序在研究DNA甲基化检测中的差别。方法:提取GT1-7细胞基因组DNA并进行亚硫酸氢盐处理。进行巢式PCR,将PCR产物进行二代测序。同时采用金标准的亚硫酸氢盐修饰后克隆测序的方法作为对照,对相同批次的PCR产物进行克隆测序。结果:PCR产物二代测序结果表明KISS1和GnRH两个基因的27个CpG甲基化位点信息完整,结果准确。挑取10个克隆进行一代测序结果表明序列无丢失,KISS1和GnRH两个基因的27个CpG甲基化位点信息完整。结论:利用高通量的二代测序技术能够有效的对DNA甲基化的PCR产物进行检测,二代测序和克隆测序都是研究DNA甲基化的有效方法,但前者与克隆测序相比每一个读取序列(reads)都相当于一个单克隆,且二代测序每个区段得到成百上千个reads,因此二代测序结果更加精确。  相似文献   

11.
12.

Background

Bisulfite sequencing using next generation sequencers yields genome-wide measurements of DNA methylation at single nucleotide resolution. Traditional aligners are not designed for mapping bisulfite-treated reads, where the unmethylated Cs are converted to Ts. We have developed BS Seeker, an approach that converts the genome to a three-letter alphabet and uses Bowtie to align bisulfite-treated reads to a reference genome. It uses sequence tags to reduce mapping ambiguity. Post-processing of the alignments removes non-unique and low-quality mappings.

Results

We tested our aligner on synthetic data, a bisulfite-converted Arabidopsis library, and human libraries generated from two different experimental protocols. We evaluated the performance of our approach and compared it to other bisulfite aligners. The results demonstrate that among the aligners tested, BS Seeker is more versatile and faster. When mapping to the human genome, BS Seeker generates alignments significantly faster than RMAP and BSMAP. Furthermore, BS Seeker is the only alignment tool that can explicitly account for tags which are generated by certain library construction protocols.

Conclusions

BS Seeker provides fast and accurate mapping of bisulfite-converted reads. It can work with BS reads generated from the two different experimental protocols, and is able to efficiently map reads to large mammalian genomes. The Python program is freely available at http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html.  相似文献   

13.

Background

RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.

Results

We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.

Conclusion

Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0467-2) contains supplementary material, which is available to authorized users.  相似文献   

14.
Identification of candidate genomic regions associated with target traits using conventional mapping methods is challenging and time‐consuming. In recent years, a number of single nucleotide polymorphism (SNP)‐based mapping approaches have been developed and used for identification of candidate/putative genomic regions. However, in the majority of these studies, insertion–deletion (Indel) were largely ignored. For efficient use of Indels in mapping target traits, we propose Indel‐seq approach, which is a combination of whole‐genome resequencing (WGRS) and bulked segregant analysis (BSA) and relies on the Indel frequencies in extreme bulks. Deployment of Indel‐seq approach for identification of candidate genomic regions associated with fusarium wilt (FW) and sterility mosaic disease (SMD) resistance in pigeonpea has identified 16 Indels affecting 26 putative candidate genes. Of these 26 affected putative candidate genes, 24 genes showed effect in the upstream/downstream of the genic region and two genes showed effect in the genes. Validation of these 16 candidate Indels in other FW‐ and SMD‐resistant and FW‐ and SMD‐susceptible genotypes revealed a significant association of five Indels (three for FW and two for SMD resistance). Comparative analysis of Indel‐seq with other genetic mapping approaches highlighted the importance of the approach in identification of significant genomic regions associated with target traits. Therefore, the Indel‐seq approach can be used for quick and precise identification of candidate genomic regions for any target traits in any crop species.  相似文献   

15.
Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.  相似文献   

16.
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/  相似文献   

17.
Recent advances in sequencing technology have enabled the rapid generation of billions of bases at relatively low cost. A crucial first step in many sequencing applications is to map those reads to a reference genome. However, when the reference genome is large, finding accurate mappings poses a significant computational challenge due to the sheer amount of reads, and because many reads map to the reference sequence approximately but not exactly. We introduce Hobbes, a new gram-based program for aligning short reads, supporting Hamming and edit distance. Hobbes implements two novel techniques, which yield substantial performance improvements: an optimized gram-selection procedure for reads, and a cache-efficient filter for pruning candidate mappings. We systematically tested the performance of Hobbes on both real and simulated data with read lengths varying from 35 to 100 bp, and compared its performance with several state-of-the-art read-mapping programs, including Bowtie, BWA, mrsFast and RazerS. Hobbes is faster than all other read mapping programs we have tested while maintaining high mapping quality. Hobbes is about five times faster than Bowtie and about 2–10 times faster than BWA, depending on read length and error rate, when asked to find all mapping locations of a read in the human genome within a given Hamming or edit distance, respectively. Hobbes supports the SAM output format and is publicly available at http://hobbes.ics.uci.edu.  相似文献   

18.

Background

Massively parallel sequencing offers an enormous potential for expression profiling, in particular for interspecific comparisons. Currently, different platforms for massively parallel sequencing are available, which differ in read length and sequencing costs. The 454-technology offers the highest read length. The other sequencing technologies are more cost effective, on the expense of shorter reads. Reliable expression profiling by massively parallel sequencing depends crucially on the accuracy to which the reads could be mapped to the corresponding genes.

Methodology/Principal Findings

We performed an in silico analysis to evaluate whether incorrect mapping of the sequence reads results in a biased expression pattern. A comparison of six available mapping software tools indicated a considerable heterogeneity in mapping speed and accuracy. Independently of the software used to map the reads, we found that for compact genomes both short (35 bp, 50 bp) and long sequence reads (100 bp) result in an almost unbiased expression pattern. In contrast, for species with a larger genome containing more gene families and repetitive DNA, shorter reads (35–50 bp) produced a considerable bias in gene expression. In humans, about 10% of the genes had fewer than 50% of the sequence reads correctly mapped. Sequence polymorphism up to 9% had almost no effect on the mapping accuracy of 100 bp reads. For 35 bp reads up to 3% sequence divergence did not affect the mapping accuracy strongly. The effect of indels on the mapping efficiency strongly depends on the mapping software.

Conclusions/Significance

In complex genomes, expression profiling by massively parallel sequencing could introduce a considerable bias due to incorrectly mapped sequence reads if the read length is short. Nevertheless, this bias could be accounted for if the genomic sequence is known. Furthermore, sequence polymorphisms and indels also affect the mapping accuracy and may cause a biased gene expression measurement. The choice of the mapping software is highly critical and the reliability depends on the presence/absence of indels and the divergence between reads and the reference genome. Overall, we found SSAHA2 and CLC to produce the most reliable mapping results.  相似文献   

19.
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.  相似文献   

20.
Accurate estimation of expression levels from RNA-Seq data entails precise mapping of the sequence reads to a reference genome. Because the standard reference genome contains only one allele at any given locus, reads overlapping polymorphic loci that carry a non-reference allele are at least one mismatch away from the reference and, hence, are less likely to be mapped. This bias in read mapping leads to inaccurate estimates of allele-specific expression (ASE). To address this read-mapping bias, we propose the construction of an enhanced reference genome that includes the alternative alleles at known polymorphic loci. We show that mapping to this enhanced reference reduced the read-mapping biases, leading to more reliable estimates of ASE. Experiments on simulated data show that the proposed strategy reduced the number of loci with mapping bias by ≥63% when compared with a previous approach that relies on masking the polymorphic loci and by ≥18% when compared with the standard approach that uses an unaltered reference. When we applied our strategy to actual RNA-Seq data, we found that it mapped up to 15% more reads than the previous approaches and identified many seemingly incorrect inferences made by them.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号