首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒


Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.


We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.


Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users.  相似文献   

Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates.


AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher.  相似文献   

Genome assembly has always been complicated due to the inherent difficulties of sequencing technologies, as well the computational methods used to process sequences. Although many of the problems for the generation of contigs from reads are well known, especially those involving short reads, the orientation and ordination of contigs in the finishing stages is still very challenging and time consuming, as it requires the manual curation of the contigs to guarantee correct identification them and prevent misassembly. Due to the large numbers of sequences that are produced, especially from the reads produced by next generation sequencers, this process demands considerable manual effort, and there are few software options available to facilitate the process. To address this problem, we have developed the Graphic Contig Analyzer for All Sequencing Platforms (G4ALL): a stand-alone multi-user tool that facilitates the editing of the contigs produced in the assembly process. Besides providing information on the gene products contained in each contig, obtained through a search of the available biological databases, G4ALL produces a scaffold of the genome, based on the overlap of the contigs after curation.


The software is available at: http://www.genoma.ufpa.br/rramos/softwares/g4all.xhtml  相似文献   



Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads.


We have developed a general-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and correct insertion, deletion, and homopolymer errors while remaining sensitive to low coverage areas of sequencing projects. Using published data sets, we correct 94% of Illumina MiSeq errors, 88% of Ion Torrent PGM errors, 85% of Roche 454 GS Junior errors. Introduced errors are 20 to 70 times more rare than successfully corrected errors. Furthermore, we show that the quality of assemblies improves when reads are corrected by our software.


Pollux is highly effective at correcting errors across platforms, and is consistently able to perform as well or better than currently available error correction software. Pollux provides general-purpose error correction and may be used in applications with or without assembly.  相似文献   

The draft sequence of several complete protozoan genomes is now available and genome projects are ongoing for a number of other species. Different strategies are being implemented to identify and annotate protein coding and RNA genes in these genomes, as well as study their genomic architecture. Since the genomes vary greatly in size, GC-content, nucleotide composition, and degree of repetitiveness, genome structure is often a factor in choosing the methodology utilised for annotation. In addition, the approach taken is dictated, to a greater or lesser extent, by the particular reasons for carrying out genome-wide analyses and the level of funding available for projects. Nevertheless, these projects have provided a plethora of material that will aid in understanding the biology and evolution of these parasites, as well as identifying new targets that can be used to design urgently required drug treatments for the diseases they cause.  相似文献   

【目的】研究复合菌发酵饲料对生长育肥猪结肠发酵、结肠黏膜与结肠内容物菌群组成的影响。【方法】采用气相色谱法检测育肥猪结肠内容物中挥发性脂肪酸浓度;采用MiSeq高通量测序方法检测育肥猪结肠黏膜与内容物中细菌菌群组成。【结果】饲喂发酵饲料对结肠黏膜及内容物中菌群多样性无显著影响(P0.05);显著提高了猪结肠黏膜中魏斯菌属和柔嫩梭菌属的相对丰度(P0.05),提高了结肠内容物中魏斯菌属、Subdoligranulum菌属相对丰度(P0.05);饲喂发酵饲料对结肠内容物中pH、乳酸、乙酸、丙酸、异丁酸、戊酸、异戊酸和总挥发性脂肪酸浓度无显著影响(P0.05),但显著提高了结肠内容物中的丁酸水平(P0.05)。【结论】饲喂复合菌发酵饲料可在一定程度上影响育肥猪结肠中细菌菌群的组成,促进丁酸生成,对肠道健康具有改善作用。  相似文献   



Molecular marker-assisted breeding provides an efficient tool to develop improved crop varieties. A major challenge for the broad application of markers in marker-assisted selection is that the marker phenotypes must match plant phenotypes in a wide range of breeding germplasm. In this study, we used the legume crop species Lupinus angustifolius (lupin) to demonstrate the utility of whole genome sequencing and re-sequencing on the development of diagnostic markers for molecular plant breeding.


Nine lupin cultivars released in Australia from 1973 to 2007 were subjected to whole genome re-sequencing. The re-sequencing data together with the reference genome sequence data were used in marker development, which revealed 180,596 to 795,735 SNP markers from pairwise comparisons among the cultivars. A total of 207,887 markers were anchored on the lupin genetic linkage map. Marker mining obtained an average of 387 SNP markers and 87 InDel markers for each of the 24 genome sequence assembly scaffolds bearing markers linked to 11 genes of agronomic interest. Using the R gene PhtjR conferring resistance to phomopsis stem blight disease as a test case, we discovered 17 candidate diagnostic markers by genotyping and selecting markers on a genetic linkage map. A further 243 candidate diagnostic markers were discovered by marker mining on a scaffold bearing non-diagnostic markers linked to the PhtjR gene. Nine out from the ten tested candidate diagnostic markers were confirmed as truly diagnostic on a broad range of commercial cultivars. Markers developed using these strategies meet the requirements for broad application in molecular plant breeding.


We demonstrated that low-cost genome sequencing and re-sequencing data were sufficient and very effective in the development of diagnostic markers for marker-assisted selection. The strategies used in this study may be applied to any trait or plant species. Whole genome sequencing and re-sequencing provides a powerful tool to overcome current limitations in molecular plant breeding, which will enable plant breeders to precisely pyramid favourable genes to develop super crop varieties to meet future food demands.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1878-5) contains supplementary material, which is available to authorized users.  相似文献   



Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned.


We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study.


Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1101) contains supplementary material, which is available to authorized users.  相似文献   



The sulfate-reducing bacterium Desulfococcus biacutus is able to utilize acetone for growth by an inducible degradation pathway that involves a novel activation reaction for acetone with CO as a co-substrate. The mechanism, enzyme(s) and gene(s) involved in this acetone activation reaction are of great interest because they represent a novel and yet undefined type of activation reaction under strictly anoxic conditions.


In this study, a draft genome sequence of D. biacutus was established. Sequencing, assembly and annotation resulted in 159 contigs with 5,242,029 base pairs and 4773 predicted genes; 4708 were predicted protein-encoding genes, and 3520 of these had a functional prediction. Proteins and genes were identified that are specifically induced during growth with acetone. A thiamine diphosphate-requiring enzyme appeared to be highly induced during growth with acetone and is probably involved in the activation reaction. Moreover, a coenzyme B12- dependent enzyme and proteins that are involved in redox reactions were also induced during growth with acetone.


We present for the first time the genome of a sulfate reducer that is able to grow with acetone. The genome information of this organism represents an important tool for the elucidation of a novel reaction mechanism that is employed by a sulfate reducer in acetone activation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-584) contains supplementary material, which is available to authorized users.  相似文献   



CRISPR-Cas9 is a revolutionary genome editing technique that allows for efficient and directed alterations of the eukaryotic genome. This relatively new technology has already been used in a large number of ‘loss of function’ experiments in cultured cells. Despite its simplicity and efficiency, screening for mutated clones remains time-consuming, laborious and/or expensive.


Here we report a high-throughput screening strategy that allows parallel screening of up to 96 clones, using next-generation sequencing. As a proof of principle, we used CRISPR-Cas9 to disrupt the coding sequence of the homeobox gene, Evx1 in mouse embryonic stem cells. We screened 67 CRISPR-Cas9 transfected clones simultaneously by next-generation sequencing on the Ion Torrent PGM. We were able to identify both homozygous and heterozygous Evx1 mutants, as well as mixed clones, which must be identified to maintain the integrity of subsequent experiments.


Our CRISPR-Cas9 screening strategy could be widely applied to screen for CRISPR-Cas9 mutants in a variety of contexts including the generation of mutant cell lines for in vitro research, the generation of transgenic organisms and for assessing the veracity of CRISPR-Cas9 homology directed repair. This technique is cost and time-effective, provides information on clonal heterogeneity and is adaptable for use on various sequencing platforms.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1002) contains supplementary material, which is available to authorized users.  相似文献   

Pseudogenes are frequently encountered noncoding sequences with a high sequence similarity to their protein-coding paralogue. For this reason, their presence is often considered troublesome in molecular diagnostics. In pseudoxanthoma elasticum(PXE), a disease predominantly caused by mutations in ATPbinding cassette family C member 6(ABCC6), the presence of two pseudogenes complicates the analysis of sequence data. With whole-exome sequencing(WES) becoming the standard of care in molecular diagnostics, we wanted to evaluate whether this technique is as reliable as gene-specific targeted enrichment analysis for the analysis of ABCC6. We established a PCR-based targeted enrichment and next-generation sequencing testing approach and demonstrated that the ABCC6-specific enrichment combined with the applied mapping algorithm overcomes the complication of ABCC6 pseudogene aspecificities, contrary to WES. We propose a time-and cost-efficient diagnostic strategy for comprehensive and accurate molecular genetic testing of PXE, which is highly automatable.  相似文献   

As next-generation sequencing (NGS) technology has become widely used to identify genetic causal variants for various diseases and traits,a number of packages for checking NGS data quality have sprung up in public domains. In addition to the quality of sequencing data,sample quality issues,such as gender mismatch,abnormal inbreeding coefficient,cryptic relatedness,and population outliers,can also have fundamental impact on downstream analysis. However,there is a lack of tools specialized in identifying problematic samples from NGS data,often due to the limitation of sample size and variant counts. We developed SeqSQC,a Bioconductor package,to automate and accelerate sample cleaning in NGS data of any scale. SeqSQC is designed for efficient data storage and access,and equipped with interactive plots for intuitive data visualization to expedite the identification of problematic samples. SeqSQC is available at http://bioconductor. org/packages/SeqSQC.  相似文献   

The vast amount of data produced by next-generation sequencing (NGS) has necessitated the development of computational tools to assist in understanding the myriad functions performed by the biological macromolecules involved in heredity. In this work, we developed the FunSys programme, a stand-alone tool with an user friendly interface that enables us to evaluate and correlate differential expression patterns from RNA sequencing and proteomics datasets. The FunSys generates charts and reports based on the results of the analysis of differential expression to aid the interpretation of the results. AVAILABILITY: The database is available for free at https://sourceforge.net/projects/funsysufpa/  相似文献   



DNA-based methods like PCR efficiently identify and quantify the taxon composition of complex biological materials, but are limited to detecting species targeted by the choice of the primer assay. We show here how untargeted deep sequencing of foodstuff total genomic DNA, followed by bioinformatic analysis of sequence reads, facilitates highly accurate identification of species from all kingdoms of life, at the same time enabling quantitative measurement of the main ingredients and detection of unanticipated food components.


Sequence data simulation and real-case Illumina sequencing of DNA from reference sausages composed of mammalian (pig, cow, horse, sheep) and avian (chicken, turkey) species are able to quantify material correctly at the 1% discrimination level via a read counting approach. An additional metagenomic step facilitates identification of traces from animal, plant and microbial DNA including unexpected species, which is prospectively important for the detection of allergens and pathogens.


Our data suggest that deep sequencing of total genomic DNA from samples of heterogeneous taxon composition promises to be a valuable screening tool for reference species identification and quantification in biosurveillance applications like food testing, potentially alleviating some of the problems in taxon representation and quantification associated with targeted PCR-based approaches.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-639) contains supplementary material, which is available to authorized users.  相似文献   

Laboratories working with draft phase genomes have specific software needs, such as the unattended processing of hundreds of single scaffolds and subsequent sequence annotation. In addition, it is critical to follow the "movement" and the manual annotation of single open reading frames (ORFs) within the successive sequence updates. Even with finished genomes, regular database updates can lead to significant changes in the annotation of single ORFs. In functional genomics it is important to mine data and identify new genetic targets rapidly and easily. Often there is no need for sophisticated relational databases (RDB) that greatly reduce the system-independent access of the results. Another aspect is the internet dependency of most software packages. If users are working with confidential data, this dependency poses a security issue. GAMOLA was designed to handle the numerous scaffolds and changing contents of draft phase genomes in an automated process and stores the results for each predicted ORF in flatfile databases. In addition, annotation transfers, ORF designation tracking, Blast comparisons, and primer design for whole genome microarrays have been implemented. The software is available under the license of North Carolina State University. A website and a downloadable example are accessible under (http://fsweb2.schaub. ncsu.edu/TRKwebsite/index.htm).  相似文献   

Next-generation sequencing (NGS) has caused a revolution in biology. NGS requires the preparation of libraries in which (fragments of) DNA or RNA molecules are fused with adapters followed by PCR amplification and sequencing. It is evident that robust library preparation methods that produce a representative, non-biased source of nucleic acid material from the genome under investigation are of crucial importance. Nevertheless, it has become clear that NGS libraries for all types of applications contain biases that compromise the quality of NGS datasets and can lead to their erroneous interpretation. A detailed knowledge of the nature of these biases will be essential for a careful interpretation of NGS data on the one hand and will help to find ways to improve library quality or to develop bioinformatics tools to compensate for the bias on the other hand. In this review we discuss the literature on bias in the most common NGS library preparation protocols, both for DNA sequencing (DNA-seq) as well as for RNA sequencing (RNA-seq). Strikingly, almost all steps of the various protocols have been reported to introduce bias, especially in the case of RNA-seq, which is technically more challenging than DNA-seq. For each type of bias we discuss methods for improvement with a view to providing some useful advice to the researcher who wishes to convert any kind of raw nucleic acid into an NGS library.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号