共查询到20条相似文献,搜索用时 15 毫秒
1.
Biao Tang Wei Zhao Huajun Zheng Ying Zhuo Lixin Zhang Guo-Ping Zhao 《Journal of bacteriology》2012,194(20):5699-5700
The genome of Amycolatopsis mediterranei S699 was resequenced and assembled de novo. By comparing the sequences of S699 previously released and that of A. mediterranei U32, about 10 kb of major indels was found to differ between the two S699 genomes, and the differences are likely attributable to their different assembly strategies. 相似文献
2.
Fatma Onmus-Leone Jun Hang Robert J. Clifford Yu Yang Matthew C. Riley Robert A. Kuschner Paige E. Waterman Emil P. Lesho 《PloS one》2013,8(4)
Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla
NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity. 相似文献
3.
Henriëtte van der Zwan Francois van der Westhuizen Carina Visser Rencia van der Sluis 《Animal biotechnology》2018,29(4):241-246
In aviculture, lovebirds are considered one of the most popular birds to keep. This African parakeet is known for its range of plumage colors and ease to tame. Plumage variation is the most important price-determining trait of these birds, and also the main selection criterion for breeders. Currently, no genetic screening tests for traits of economic importance or to confirm pedigree data are available for any of the nine lovebird species. As a starting point to develop these tests, the de novo genome of Agapornis roseicollis (rosy-faced lovebird) was sequenced, assembled, and annotated. Sequencing was done on the Illumina HiSeq 2000 platform and the assembly was performed using SOAPdenovo v2.04. The genome was found to be 1.1?Gb in size and 16,044 genes were identified and annotated. This compared well with other previously sequenced avian genomes, such as the chicken, zebra finch, and budgerigar. To assess genome completeness, the number of benchmarking universal single-copy orthologs were identified in the genome. This was compared to other previously assembled avian genomes and the results indicated that the genome will be useful in the development of genetic screening tests to aid lovebird breeders in selecting breeding pairs. 相似文献
4.
Zhenglin Du Liang Ma Hongzhu Qu Wei Chen Bing Zhang Xi Lu Weibo Zhai Xin Sheng Yongqiao Sun Wenjie Li Meng Lei Qiuhui Qi Na Yuan Shuo Shi Jingyao Zeng Jinyue Wang Yadong Yang Qi Liu Yaqiang Hong Lili Dong Zhewen Zhang Dong Zou Yanqing Wang Shuhui Song Fan Liu Xiangdong Fang Hua Chen Xin Liu Jingfa Xiao Changqing Zeng 《基因组蛋白质组与生物信息学报(英文版)》2019,17(3):229-247
To unravel the genetic mechanisms of disease and physiological traits,it requires comprehensive sequencing analysis of large sample size in Chinese populations.Here,we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative(CASPMI) project launched by the Chinese Academy of Sciences,including the de novo assembly of a northern Han reference genome(NH1.0) and whole genome analyses of 597 healthy people coming from most areas in China.Given the two existing reference genomes for Han Chinese(YH and HX1) were both from the south,we constructed NH1.0,a new reference genome from a northern individual,by combining the sequencing strategies of Pac Bio,10? Genomics,and Bionano mapping.Using this integrated approach,we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1.In order to generate a genomic variation map of Chinese populations,we performed the whole-genome sequencing of597 participants and identified 24.85 million(M) single nucleotide variants(SNVs),3.85 M small indels,and 106,382 structural variations.In the association analysis with collected phenotypes,we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males.Moreover,significant genetic diversity in MTHFR,TCN2,FADS1,and FADS2,which associate with circulating folate,vitamin B12,or lipid metabolism,was observed between northerners and southerners.Especially,for the homocysteine-increasing allele of rs1801133(MTHFR 677 T),we hypothesize that there exists a ‘‘comfort" zone for a high frequency of 677 T between latitudes of 35–45 degree North.Taken together,our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine. 相似文献
5.
Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias. 相似文献
6.
Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/. 相似文献
7.
Harish Nagarajan Jessica E. Butler Anna Klimes Yu Qiu Karsten Zengler Joy Ward Nelson D. Young Barbara A. Methé Bernhard ?. Palsson Derek R. Lovley Christian L. Barrett 《PloS one》2010,5(6)
State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another. 相似文献
8.
Xiaoteng Fu Jinzhuang Dou Junxia Mao Hailin Su Wenqian Jiao Lingling Zhang Xiaoli Hu Xiaoting Huang Shi Wang Zhenmin Bao 《PloS one》2013,8(11)
Genetic linkage maps are indispensable tools in genetic, genomic and breeding studies. As one of genotyping-by-sequencing methods, RAD-Seq (restriction-site associated DNA sequencing) has gained particular popularity for construction of high-density linkage maps. Current RAD analytical tools are being predominantly used for typing codominant markers. However, no genotyping algorithm has been developed for dominant markers (resulting from recognition site disruption). Given their abundance in eukaryotic genomes, utilization of dominant markers would greatly diminish the extensive sequencing effort required for large-scale marker development. In this study, we established, for the first time, a novel statistical framework for de novo dominant genotyping in mapping populations. An integrated package called RADtyping was developed by incorporating both de novo codominant and dominant genotyping algorithms. We demonstrated the superb performance of RADtyping in achieving remarkably high genotyping accuracy based on simulated and real mapping datasets. The RADtyping package is freely available at http://www2.ouc.edu.cn/mollusk/ detailen.asp?id=727. 相似文献
9.
10.
Aarti Desai Veer Singh Marwah Akshay Yadav Vineet Jha Kishor Dhaygude Ujwala Bangar Vivek Kulkarni Abhay Jere 《PloS one》2013,8(4)
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. 相似文献
11.
一株高度变异的中国SV40分离株的全基因组序列分析 总被引:2,自引:0,他引:2
对SV40中国云南分离株YNQD38进行了全基因组核苷酸序列测定。覆盖了整个基因组的9个重叠的基因片段被扩增和测序,与其它SV40株进行了序列比对并基于全基因序列建立了遗传进化树。结果显示:基因组全长5125bp,基因组构成与其它SV40毒株相似,均有6个开放读码框架和1个调控区。YNQD38与已被证实高度保守的其它SV40比,全基因组核苷酸同源性仅为91.0%。在SV40的保守区VP1、VP2、VP3、小t抗原(t-ag)和部分大T抗原(不包括大T抗原C末端)区,YNQD38与其它SV40之间核苷酸同源性分别为90.7%~91.1%、91.7%~92.0%、90.2%~90.8%、92.8%~93.3%、88.5%~89.7%。在SV40的可变区大T抗原C末端(T-ag-C)编码区,YNQD38同源性更低,仅为65.7%~74.3%。YNQD38发生在保守区的核苷酸变异多为无义突变,而发生在变异区的核苷酸变异多为有义突变。YNQD38的调控区缺少一个完整的72bp增强子,这种特别的调控区的结构以前未见报道。基于整个基因组构建的进化树显示该株病毒形成了一个独特的组。以上结果表明YNQD38是目前报道的SV40中变异最大的一株,而且也是第一株被完整测序的SV40中国株。这个报道不仅为SV40中国株的基础研究提供了一个完整清楚的分子生物学资料,还对这样一株高度变异的SV40能否成为人类致病因子进行了初步探讨。 相似文献
12.
13.
Highly pathogenic porcine reproductive and respiratory syndrome (HP-PRRS) emerged in China in 2006, and HP-PRRS virus (HP-PRRSV) has evolved continuously. Here, the complete genomic sequence of a novel HP-PRRSV field strain, JX, is reported. The present finding will contribute to further studies focusing on the evolutionary mechanism of PRRSV. 相似文献
14.
Kei-ichiro Mishiba Satoshi Yamasaki Takashi Nakatsuka Yoshiko Abe Hiroyuki Daimon Masayuki Oda Masahiro Nishihara 《PloS one》2010,5(3)
A novel transgene silencing phenomenon was found in the ornamental plant, gentian (Gentiana triflora × G. scabra), in which the introduced Cauliflower mosaic virus (CaMV) 35S promoter region was strictly methylated, irrespective of the transgene copy number and integrated loci. Transgenic tobacco having the same vector did not show the silencing behavior. Not only unmodified, but also modified 35S promoters containing a 35S enhancer sequence were found to be highly methylated in the single copy transgenic gentian lines. The 35S core promoter (−90)-introduced transgenic lines showed a small degree of methylation, implying that the 35S enhancer sequence was involved in the methylation machinery. The rigorous silencing phenomenon enabled us to analyze methylation in a number of the transgenic lines in parallel, which led to the discovery of a consensus target region for de novo methylation, which comprised an asymmetric cytosine (CpHpH; H is A, C or T) sequence. Consequently, distinct footprints of de novo methylation were detected in each (modified) 35S promoter sequence, and the enhancer region (−148 to −85) was identified as a crucial target for de novo methylation. Electrophoretic mobility shift assay (EMSA) showed that complexes formed in gentian nuclear extract with the −149 to −124 and −107 to −83 region probes were distinct from those of tobacco nuclear extracts, suggesting that the complexes might contribute to de novo methylation. Our results provide insights into the phenomenon of sequence- and species- specific gene silencing in higher plants. 相似文献
15.
16.
Legumes are a highly diverse angiosperm family that include many agriculturally important species. To date, 21 complete chloroplast genomes have been sequenced from legume crops confined to the Papilionoideae subfamily. Here we report the first chloroplast genome from the Mimosoideae, Acacia ligulata, and compare it to the previously sequenced legume genomes. The A. ligulata chloroplast genome is 158,724 bp in size, comprising inverted repeats of 25,925 bp and single-copy regions of 88,576 bp and 18,298 bp. Acacia ligulata lacks the inversion present in many of the Papilionoideae, but is not otherwise significantly different in terms of gene and repeat content. The key feature is its highly divergent clpP1 gene, normally considered essential in chloroplast genomes. In A. ligulata, although transcribed and spliced, it probably encodes a catalytically inactive protein. This study provides a significant resource for further genetic research into Acacia and the Mimosoideae. The divergent clpP1 gene suggests that Acacia will provide an interesting source of information on the evolution and functional diversity of the chloroplast Clp protease complex. 相似文献
17.
Asan Chunyu Geng Yan Chen Kui Wu Qingle Cai Yu Wang Yongshan Lang Hongzhi Cao Huangming Yang Jian Wang Xiuqing Zhang 《PloS one》2012,7(9)
Background
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.Results
We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.Conclusions
In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads. 相似文献18.
19.
Yongwen Luo Ying Zhang Xiangyin Liu Youtian Yang Xianfeng Yang Daiting Zhang Xianbo Deng Xiaowei Wu Xiaofeng Guo 《Journal of virology》2012,86(22):12454-12455
A virulent rabies virus (RABV) strain, GD-SH-01, was isolated from brain tissue of a rabid pig in China. This report describes the first complete genome sequence of a swine-origin RABV strain, and this information will provide important insights into the transmission cycle and genetic diversity of RABV from different hosts in China. 相似文献
20.
Antonio Mu?oz-Mérida Juan José González-Plaza Andrés Ca?ada Ana María Blanco Maria del Carmen García-López José Manuel Rodríguez Laia Pedrola M. Dolores Sicardo M. Luisa Hernández Raúl De la Rosa Angjelina Belaj Mayte Gil-Borja Francisco Luque José Manuel Martínez-Rivas David G. Pisano Oswaldo Trelles Victoriano Valpuesta Carmen R. Beuzón 《DNA research》2013,20(1):93-108
Olive breeding programmes are focused on selecting for traits as short juvenile period, plant architecture suited for mechanical harvest, or oil characteristics, including fatty acid composition, phenolic, and volatile compounds to suit new markets. Understanding the molecular basis of these characteristics and improving the efficiency of such breeding programmes require the development of genomic information and tools. However, despite its economic relevance, genomic information on olive or closely related species is still scarce. We have applied Sanger and 454 pyrosequencing technologies to generate close to 2 million reads from 12 cDNA libraries obtained from the Picual, Arbequina, and Lechin de Sevilla cultivars and seedlings from a segregating progeny of a Picual × Arbequina cross. The libraries include fruit mesocarp and seeds at three relevant developmental stages, young stems and leaves, active juvenile and adult buds as well as dormant buds, and juvenile and adult roots. The reads were assembled by library or tissue and then assembled together into 81 020 unigenes with an average size of 496 bases. Here, we report their assembly and their functional annotation. 相似文献