首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
转录本组装是基于第二代测序技术研究转录组的关键环节,其质量好坏直接影响到下游结果的可靠性,也是目前的研究热点与难点。转录本组装方法可以分为Genome-guided和de novo两类,它们在理论基础与算法实现方面各有优劣。转录本组装质量的高低依赖于PCR扩增错误率、第二代测序技术准确率、组装算法和参考基因组完整性等方面,而现有的算法还无法完全处理由这些因素带来的影响。本文从转录本组装方法与软件、影响组装质量的因素和对组装质量的评价指标等方面进行讨论,以期能指导纯生物学家对分析软件的选择。  相似文献   

2.
[目的]中华大仰蝽Notonecta chinensis为中国和日本冲绳分布的重要水生天敌昆虫,可用于蚊虫的生物防治.本研究旨在建立中华大仰蝽转录组数据库,挖掘其基因信息.[方法]采用高通量测序平台Illumina NextSeq500对中华大仰蝽进行转录组测序、de novo组装及生物信息学分析;利用MISA软件基于...  相似文献   

3.
鸡具有独特的生物学特性。全基因组测序是即对一种生物的基因组中的全部基因进行测序,主要包括de novo测序和全基因组重测序。近年来,由于测序技术的飞速发展,鸡全基因组研究取得了很多重要的进展,为解释鸡的生物学特性和缩短分子育种周期发挥了重要作用。重点阐述了不同品种鸡全基因组de novo测序的完成、全基因组重测序技术在解析品种遗传多样性、揭示进化机制、质量性状遗传机制广泛应用,讨论了鸡全基因组测序工作存在的问题及其展望,旨在为家鸡种质资源的改良和分子育种提供重要的参考资料。  相似文献   

4.
<正>最近,中国科学院水生生物研究所、中国科学院国家基因研究中心以及中山大学等机构的研究人员合作完成了草鱼(Ctenopharyngodon idellus)全基因组序列图谱绘制.他们采用鸟枪法测序策略,分别对一尾雌性和雄性草鱼进行了全基因组测序,通过改良的de novo Phusion-meta拼接,获得雌性(0.9 GB)和雄性(1.07 GB)草鱼基因组组装序列.其中雌性草鱼选用的是人工减数分裂的雌核发育个体,显著提高了基因组纯合度,与已完成的其他水产动物基因组比  相似文献   

5.
以美洲大蠊Periplaneta americana为原料生产的康复新液等药品临床疗效显著,得到了广泛应用。本文以四川好医生攀西药业有限责任公司饲养的药用美洲大蠊为材料,首次采用Illumina Hi Seq 2000和Pac Bio SMRT测序平台开展了全基因组测序,并进行基因组组装、注释和分析。原始测序数据经过滤后得到1.4 Tb的二代测序数据和33.81 Gb的三代测序数据。组装结果表明,美洲大蠊基因组大小为3.26 Gb,这在已报道的昆虫基因组中仅次于东亚飞蝗Locusta migratoria。基因组重复序列含量为62.38%,杂合度为0.635%,表明其为复杂基因组。组装的Contig N50和scaffold N50长度分别为28.2 kb、315 kb,单拷贝基因完整性为88.1%,小片段文库测序数据平均比对率为99.8%,测序和组装质量满足后续分析要求。采用De novo预测、同源预测和基于转录本预测3种方法共注释到14 568个基因,其中92.4%的基因获得了功能注释。本研究首次完成了美洲大蠊的全基因组测序,也是大蠊属Periplaneta昆虫的第一个基因组,为美洲大蠊遗传进化分析和药用基因资源挖掘打下了重要基础。  相似文献   

6.
利用公共数据库中果蝇F1代和栽培水稻基于高通量Illumina测序平台的RNA Seq短序列数据,比较了8个 (ABySS, Velvet, SOAPdenovo, Oases, Trinity, Multiple k, T IDBA and Trans ABySS) 转录组从头组装软件。结果显示,在基于单一k mer和多重k mer方法的两类软件中,Trinity和Trans ABySS分别表现出最好的组装性能,而其它软件性能比较接近。我们还发现基于多重k mer比单一k mer可以组装获得更多的总碱基数目,但是即使利用最好的多重k mer组装软件,所获得的数据质量也比研究人员所期望的要低。鉴于此,我们提出了“ETM”优化方法,将多重k mer方法组合到Trinity中,使其在具有最好的组装性能的基础上兼具了多重k mer的优势,测试结果显示了该方法具有一定的优越性。我们的研究结果为用户选择合适的软件提供了依据,对推动基于高通量Illumina测序的转录组研究具有重要意义。  相似文献   

7.
【目的】Streptomyces sp. PRh5是从东乡野生稻(Oryza rufipogon Griff.)中分离获得的一株对细菌和真菌都具有较强抗菌活性的内生放线菌。为深入研究PRh5菌株抗菌机制及挖掘次级代谢产物基因资源,有必要解析PRh5菌株的基因组序列信息。【方法】采用高通量测序技术对PRh5菌株进行全基因组测序,然后使用相关软件对测序数据进行基因组组装、基因预测与功能注释、直系同源簇(COG)聚类分析、共线性分析及次级代谢产物合成基因簇预测等。【结果】基因组组装获得290 contigs,整个基因组大小约11.1 Mb,GC含量为71.1%,序列已提交至GenBank数据库,登录号为JABQ00000000。同时,预测得到50个次级代谢产物合成基因簇。【结论】将为Streptomyces sp. PRh5的功能基因组学研究及相关次级代谢产物的生物合成途径与异源表达研究提供基础。  相似文献   

8.
新一代测序技术(NGS)的文库制备方法在基因组的拼装中起着重要作用。但是NGS技术制备的普通DNA文库片段只有500 bp左右,难以满足复杂基因组的从头(de novo)拼装要求。三代测序技术的读长可以达到20 kb,但是其高错误率及测序成本过高使得其又不易推广。因此二代测序的Mate-paired文库制备技术一直在基因组的de novo拼装中扮演着非常重要的角色。目前主流的NGS平台Illumina制备的Mate-paired文库的片段范围只有2~5 kb,为了得到更长的可用于Illumina平台测序的Mate-paired文库,本研究首次整合并优化了Illumina和Roche/454两种测序平台的Mate-paired文库制备技术,采用诱导环化酶来提高基因组长片段DNA的环化效率,成功建立了20 kb Mate-paired文库制备技术,并已将该技术应用于人类基因组20 kb Mate-paired文库制备。该技术为Illumina平台制备长片段Mate-paired库提供了方法指导。  相似文献   

9.
黑眉锦蛇Elaphe taeniura广泛分布于东亚和东南亚,具有重要的生态学价值和经济学价值。蛇的肝脏是一个重要的代谢器官,具有重要研究价值,然而至今没有转录组数据信息。本研究应用Illumina Hiseq 2000测序仪对黑眉锦蛇肝脏cDNA进行转录组测序,分析其基因表达谱并进行功能分类。得到总reads 14 807 224对,平均读长长度为93 bp。进行de novo组装后,得到93 357个contigs,71 119个unigenes,共有88 907个contigs找到蛋白编码序列,共有26 134个预测蛋白可以在24种类别中注释其生物学功能。同时筛选到253个参与防御机制的蛋白。这些基因的发现为充实爬行动物基因组数据库提供了基础数据,同时也为蛇类资源保护、养殖以及药物开发利用提供参考。  相似文献   

10.
蜜蜂球囊菌的参考转录组de novo组装及SSR分子标记开发   总被引:1,自引:0,他引:1  
【目的】通过RNA seq技术对纯培养的蜜蜂球囊菌Ascosphaera apis孢子和球囊菌侵染的蜜蜂幼虫肠道组织进行测序,de novo组装球囊菌的参考转录组,并对其进行功能与代谢通路注释,进而基于该转录组数据开发球囊菌的SSR分子标记。【方法】首先通过差速离心获得活化的球囊菌孢子,配制含1×107孢子/m L的饲料饲喂4,5和6日龄的意大利蜜蜂Apis mellifera ligustica幼虫和中华蜜蜂Apis cerana cerana幼虫,通过Illumina Hi SeqTM2500平台同时对上述蜜蜂幼虫肠道及纯化球囊菌孢子进行深度测序,原始数据过滤后通过Trinity软件组装得到unigenes,进而通过BLASTX比对NCBI Nr,Swiss-Prot,KOG和KEGG数据库对unigenes进行功能与代谢通路注释。利用MISA软件对所有unigenes进行SSR搜索,并利用Primer Premier 5软件设计SSR特异性引物,通过PCR对不同来源的球囊菌SSR位点进行扩增。【结果】本研究共获得146 135 308条高质量reads,de novo组装得到42 609个unigenes。BLASTX比对结果显示,29 316个unigenes在上述公共数据库中具有功能和代谢通路注释。注释到法夫酵母Xanthophyllomyces dendrorhous上的unigenes最多,达6 050个。KEGG注释结果显示,unigenes可注释到117个代谢通路,其中富集在核糖体(ribosome)上的unigenes数量最多(529)。所有unigenes中共预测到7 968个SSRs,通过PCR开发出5个球囊菌SSR分子标记。【结论】本研究成功组装球囊菌的参考转录组,并进行了功能与代谢通路注释,可为在分子水平深入研究球囊菌提供重要的参考信息。基于该转录组信息开发的5个SSR分子标记可推动菌株鉴定、基因图谱构建及基因定位等研究。  相似文献   

11.
Annotated genomes can provide new perspectives on the biology of species. We present the first de novo whole genome sequencing for the pink-footed goose. In order to obtain a high-quality de novo assembly the strategy used was to combine one short insert paired-end library with two mate-pair libraries. The pink-footed goose genome was assembled de novo using three different assemblers and an assembly evaluation was subsequently performed in order to choose the best assembler. For our data, ALLPATHS-LG performed the best, since the assembly produced covers most of the genome, while introducing the fewest errors. A total of 26,134 genes were annotated, with bird species accounting for virtually all BLAST hits. We also estimated the substitution rate in the pink-footed goose, which can be of use in future demographic studies, by using a comparative approach with the genome of the chicken, the mallard and the swan goose. A substitution rate of 1.38 × 10? 7 per nucleotide per generation was obtained when comparing the genomes of the two closely-related goose species (the pink-footed and the swan goose). Altogether, we provide a valuable tool for future genomic studies aiming at particular genes and regions of the pink-footed goose genome as well as other bird species.  相似文献   

12.
13.
Zhang  Hui  Wang  Yuexing  Deng  Ce  Zhao  Sheng  Zhang  Peng  Feng  Jie  Huang  Wei  Kang  Shujing  Qian  Qian  Xiong  Guosheng  Chang  Yuxiao 《中国科学:生命科学英文版》2022,65(2):398-411

High-quality rice reference genomes have accelerated the comprehensive identification of genome-wide variations and research on functional genomics and breeding. Tian-you-hua-zhan has been a leading hybrid in China over the past decade. Here, de novo genome assembly strategy optimization for the rice indica lines Huazhan (HZ) and Tianfeng (TF), including sequencing platforms, assembly pipelines and sequence depth, was carried out. The PacBio and Nanopore platforms for long-read sequencing were utilized, with the Canu, wtdbg2, SMARTdenovo, Flye, Canu-wtdbg2, Canu-SMARTdenovo and Canu-Flye assemblers. The combination of PacBio and Canu was optimal, considering the contig N50 length, contig number, assembled genome size and polishing process. The assembled contigs were scaffolded with Hi-C data, resulting in two “golden quality” rice reference genomes, and evaluated using the scaffold N50, BUSCO, and LTR assembly index. Furthermore, 42,625 and 41,815 non-transposable element genes were annotated for HZ and TF, respectively. Based on our assembly of HZ and TF, as well as Zhenshan97, Minghui63, Shuhui498 and 9311, comprehensive variations were identified using Nipponbare as a reference. The de novo assembly strategy for rice we optimized and the “golden quality” rice genomes we produced for HZ and TF will benefit rice genomics and breeding research, especially with respect to uncovering the genomic basis of the elite traits of HZ and TF.

  相似文献   

14.
Li R  Gao S  Hernandez AG  Wechter WP  Fei Z  Ling KS 《PloS one》2012,7(5):e37127
Small RNAs (sRNA), including microRNAs (miRNA) and small interfering RNAs (siRNA), are produced abundantly in plants and animals and function in regulating gene expression or in defense against virus or viroid infection. Analysis of siRNA profiles upon virus infection in plant may allow for virus identification, strain differentiation, and de novo assembly of virus genomes. In the present study, four suspected virus-infected tomato samples collected in the U.S. and Mexico were used for sRNA library construction and deep sequencing. Each library generated between 5-7 million sRNA reads, of which more than 90% were from the tomato genome. Upon in-silico subtraction of the tomato sRNAs, the remaining highly enriched, virus-like siRNA pools were assembled with or without reference virus or viroid genomes. A complete genome was assembled for Potato spindle tuber viroid (PSTVd) using siRNA alone. In addition, a near complete virus genome (98%) also was assembled for Pepino mosaic virus (PepMV). A common mixed infection of two strains of PepMV (EU and US1), which shared 82% of genome nucleotide sequence identity, also could be differentially assembled into their respective genomes. Using de novo assembly, a novel potyvirus with less than 60% overall genome nucleotide sequence identity to other known viruses was discovered and its full genome sequence obtained. Taken together, these data suggest that the sRNA deep sequencing technology will likely become an efficient and powerful generic tool for virus identification in plants and animals.  相似文献   

15.
16.
17.

Background

De novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed.

Results

We present a new program, SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

Conclusions

SAGE benefits from innovations in almost every aspect of the assembly process: error correction of input reads, string-overlap graph construction, read copy counts estimation, overlap graph analysis and reduction, contig extraction, and scaffolding. We hope that these new ideas will help advance the current state-of-the-art in an essential area of research in genomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-302) contains supplementary material, which is available to authorized users.  相似文献   

18.
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.  相似文献   

19.
Zhang W  Chen J  Yang Y  Tang Y  Shang J  Shen B 《PloS one》2011,6(3):e17915
The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM) occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC) assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号