首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.  相似文献   

2.
Libraries constructed in bacterial artificial chromosome (BAC) vectors have become the choice for clone sets in high throughput genomic sequencing projects primarily because of their high stability. BAC libraries have been proposed as a source for minimally over-lapping clones for sequencing large genomic regions, and the use of BAC end sequences (i.e. sequences adjoining the insert sites) has been proposed as a primary means for selecting minimally overlapping clones for sequencing large genomic regions. For this strategy to be effective, high throughput methods for BAC end sequencing of all the clones in deep coverage BAC libraries needed to be developed. Here we describe a low cost, efficient, 96 well procedure for BAC end sequencing. These methods allow us to generate BAC end sequences from human and Arabidoposis libraries with an average read length of >450 bases and with a single pass sequencing average accuracy of >98%. Application of BAC end sequences in genomic sequen-cing is discussed.  相似文献   

3.

Background

Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.

Results

We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.

Conclusions

Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly.

Results

We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5'-ends were precisely mapped using 5'-full-length ESTs, an important refinement even in otherwise unchanged models.

Conclusion

Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.  相似文献   

5.
Human BAC ends quality assessment and sequence analyses   总被引:8,自引:0,他引:8  
Zhao S  Malek J  Mahairas G  Fu L  Nierman W  Venter JC  Adams MD 《Genomics》2000,63(3):321-332
End sequences from bacterial artificial chromosomes (BACs) provide highly specific sequence markers in large-scale sequencing projects. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length of >460 bp for a total of 141 Mb covering approximately 4.7% of the genome. Over 60% of the clones have BAC end sequences (BESs) from both ends representing more than fivefold coverage of the human genome by the paired-end clones. Our quality assessments and sequence analyses indicate that BESs from human BAC libraries developed at The California Institute of Technology (CalTech) and Roswell Park Cancer Institute have similar properties. The analyses have highlighted differences in insert size for different segments of the CalTech library. Problems with the fidelity of tracking of sequence data back to physical clones have been observed in some subsets of the overall BES dataset. The annotation results of BESs for the contents of available genomic sequences, sequence tagged sites, expressed sequence tags, protein encoding regions, and repeats indicate that this resource will be valuable in many areas of genome research.  相似文献   

6.
Although new and emerging next-generation sequencing (NGS) technologies have reduced sequencing costs significantly, much work remains to implement them for de novo sequencing of complex and highly repetitive genomes such as the tetraploid genome of Upland cotton (Gossypium hirsutum L.). Herein we report the results from implementing a novel, hybrid Sanger/454-based BAC-pool sequencing strategy using minimum tiling path (MTP) BACs from Ctg-3301 and Ctg-465, two large genomic segments in A12 and D12 homoeologous chromosomes (Ctg). To enable generation of longer contig sequences in assembly, we implemented a hybrid assembly method to process ~35x data from 454 technology and 2.8-3x data from Sanger method. Hybrid assemblies offered higher sequence coverage and better sequence assemblies. Homology studies revealed the presence of retrotransposon regions like Copia and Gypsy elements in these contigs and also helped in identifying new genomic SSRs. Unigenes were anchored to the sequences in Ctg-3301 and Ctg-465 to support the physical map. Gene density, gene structure and protein sequence information derived from protein prediction programs were used to obtain the functional annotation of these genes. Comparative analysis of both contigs with Arabidopsis genome exhibited synteny and microcollinearity with a conserved gene order in both genomes. This study provides insight about use of MTP-based BAC-pool sequencing approach for sequencing complex polyploid genomes with limited constraints in generating better sequence assemblies to build reference scaffold sequences. Combining the utilities of MTP-based BAC-pool sequencing with current longer and short read NGS technologies in multiplexed format would provide a new direction to cost-effectively and precisely sequence complex plant genomes.  相似文献   

7.
Bacterial artificial chromosome (BAC) cloning systems currently in use generate high quality genomic libraries for gene mapping, identification, and sequencing. However, the most commonly used BAC cloning systems do not facilitate functional studies in eukaryotic cells. To overcome this limitation, we have developed pEBAC190G, a new BAC vector that combines the features of the first generation PAC/BAC vectors with eukaryotic elements that facilitate the transfection, episomal maintenance, and functional analysis of large genomic fragments in eukaryotic cells. A number of different cloning strategies may be used to retrofit genomic fragments from existing libraries into the new vector. The system was tested by the retrofitting of a 170kb NotI genomic fragment from the RPCI-11 BAC library into the NotI site of pEBAC190G. Clones from any eukaryotic genomic library harboured in this vector can be transferred from bacteria directly to eukaryotic cells for functional analysis.  相似文献   

8.

Background

Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood.

Results

We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results.

Conclusions

In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users.  相似文献   

9.
We have developed the software package Tomato and Potato Assembly Assistance System (TOPAAS), which automates the assembly and scaffolding of contig sequences for low-coverage sequencing projects. The order of contigs predicted by TOPAAS is based on read pair information; alignments between genomic, expressed sequence tags, and bacterial artificial chromosome (BAC) end sequences; and annotated genes. The contig scaffold is used by TOPAAS for automated design of nonredundant sequence gap-flanking PCR primers. We show that TOPAAS builds reliable scaffolds for tomato (Solanum lycopersicum) and potato (Solanum tuberosum) BAC contigs that were assembled from shotgun sequences covering the target at 6- to 8-fold coverage. More than 90% of the gaps are closed by sequence PCR, based on the predicted ordering information. TOPAAS also assists the selection of large genomic insert clones from BAC libraries for walking. For this, tomato BACs are screened by automated BLAST analysis and in parallel, high-density nonselective amplified fragment length polymorphism fingerprinting is used for constructing a high-resolution BAC physical map. BLAST and amplified fragment length polymorphism analysis are then used together to determine the precise overlap. Assembly onto the seed BAC consensus confirms the BACs are properly selected for having an extremely short overlap and largest extending insert. This method will be particularly applicable where related or syntenic genomes are sequenced, as shown here for the Solanaceae, and potentially useful for the monocots Brassicaceae and Leguminosea.  相似文献   

10.
Traditional approaches for sequencing insertion ends of bacterial artificial chromosome (BAC) libraries are laborious and expensive, which are currently some of the bottlenecks limiting a better understanding of the genomic features of auto‐ or allopolyploid species. Here, we developed a highly efficient and low‐cost BAC end analysis protocol, named BAC‐anchor, to identify paired‐end reads containing large internal gaps. Our approach mainly focused on the identification of high‐throughput sequencing reads carrying restriction enzyme cutting sites and searching for large internal gaps based on the mapping locations of both ends of the reads. We sequenced and analysed eight libraries containing over 3 200 000 BAC end clones derived from the BAC library of the tetraploid potato cultivar C88 digested with two restriction enzymes, Cla I and Mlu I. About 25% of the BAC end reads carrying cutting sites generated a 60–100 kb internal gap in the potato DM reference genome, which was consistent with the mapping results of Sanger sequencing of the BAC end clones and indicated large differences between autotetraploid and haploid genotypes in potato. A total of 5341 Cla I‐ and 165 Mlu I‐derived unique reads were distributed on different chromosomes of the DM reference genome and could be used to establish a physical map of target regions and assemble the C88 genome. The reads that matched different chromosomes are especially significant for the further assembly of complex polyploid genomes. Our study provides an example of analysing high‐coverage BAC end libraries with low sequencing cost and is a resource for further genome sequencing studies.  相似文献   

11.
12.
Bacterial artificial chromosome (BAC) library is an important tool in genomic research. We constructed two libraries from the genomic DNA of grass carp (Ctenopharyngodon idellus) as a crucial part of the grass carp genome project. The libraries were constructed in the EcoRI and HindIII sites of the vector CopyControl pCC1BAC. The EcoRI library comprised 53,000 positive clones, and approximately 99.94% of the clones contained grass carp nuclear DNA inserts (average size, 139.7 kb) covering 7.4× haploid genome equivalents and 2% empty clones. Similarly, the HindIII library comprised 52,216 clones with approximately 99.82% probability of finding any genomic fragments containing single-copy genes; the average insert size was 121.5 kb with 2.8% insert-empty clones, thus providing genome coverage of 6.3× haploid genome equivalents of grass carp. We selected gene-specific probes for screening the target gene clones in the HindIII library. In all, we obtained 31 positive clones, which were identified for every gene, with an average of 6.2 BAC clones per gene probe. Thus, we succeeded in constructing the desired BAC libraries, which should provide an important foundation for future physical mapping and whole-genome sequencing in grass carp.  相似文献   

13.

Background

Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.

Principal Findings

We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.

Conclusions

These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.  相似文献   

14.
Rice is an important crop and a model system for monocot genomics, and is a target for whole genome sequencing by the International Rice Genome Sequencing Project (IRGSP). The IRGSP is using a clone by clone approach to sequence rice based on minimum tiles of BAC or PAC clones. For chromosomes 10 and 3 we are using an integrated physical map based on two fingerprinted and end-sequenced BAC libraries to identifying a minimum tiling path of clones. In this study we constructed and tested two rice genomic libraries with an average insert size of 10 kb (10-kb library) to support the gap closure and finishing phases of the rice genome sequencing project. The HaeIII library contains 166,752 clones covering approximately 4.6x rice genome equivalents with an average insert size of 10.5 kb. The Sau3AI library contains 138,960 clones covering 4.2x genome equivalents with an average insert size of 11.6 kb. Both libraries were gridded in duplicate onto 11 high-density filters in a 5 x 5 pattern to facilitate screening by hybridization. The libraries contain an unbiased coverage of the rice genome with less than 5% contamination by clones containing organelle DNA or no insert. An efficient method was developed, consisting of pooled overgo hybridization, the selection of 10-kb gap spanning clones using end sequences, transposon sequencing and utilization of in silico draft sequence, to close relatively small gaps between sequenced BAC clones. Using this method we were able to close a majority of the gaps (up to approximately 50 kb) identified during the finishing phase of chromosome-10 sequencing. This method represents a useful way to close clone gaps and thus to complete the entire rice genome.  相似文献   

15.
To isolate genes of interest in plants, it is essential to construct bacterial artificial chromosome (BAC) libraries from specific genotypes. Construction and organisation of BAC libraries is laborious and costly, especially from organisms with large and complex genomes. In the present study, we developed the pooled BAC library strategy that allows rapid and low cost generation and screening of genomic libraries from any genotype of interest. The BAC library is constructed, directly organised into a few pools and screened for BAC clones of interest using PCR and hybridisation steps, without requiring organization into individual clones. As a proof of concept, a pooled BAC library of approximately 177,000 recombinant clones has been constructed from the barley cultivar Cebada Capa that carries the Rph7 leaf rust resistance gene. The library has an average insert size of 140 kb, a coverage of six barley genome equivalents and is organised in 138 pools of about 1,300 clones each. We rapidly established a single contig of six BAC clones spanning 230 kb at the Rph7 locus on chromosome 3HS. The described low-cost cloning strategy is fast and will greatly facilitate direct targeting of genes and large-scale intra- and inter-species comparative genome analysis.Edwige Isidore and Beatrice Scherrer contributed equally to the work.  相似文献   

16.
《Journal of Asia》2020,23(3):816-824
Leptotrombidium pallidum is the major vector mite for Orientia tsutsugamushi, the causative agent of scrub typhus, in Asian countries, including Korea. Despite its medical importance, L. pallidum has little genetic information available to date. To analyze the L. pallidum genome, we extracted genomic DNA (gDNA) from a single female of a 7-generation inbred L. pallidum colony and amplified the gDNA by whole genome amplification (WGA). The resulting amplified gDNA was used to construct paired-end and mate-pair libraries that were sequenced using Illumina platforms (HiSeq2000 and MiSeq). More than 45 Gb of sequence reads from both paired-end and mate-pair libraries of the WGA gDNA were trimmed and then de novo assembled using CLC Assembly Cell v.4.0 for contig assembly and SSPACE for scaffolding. The assembly generated approximately 6,545 scaffolds with an N50 value of 92,945 and total size of ~ 193 Mb. For gene predictions, the PASA and GeneWise models were used, and ab initio gene predictions were performed independently, resulting in the prediction of 15,842 genes. RNA-Seq expression profiles revealed constitutive expression of 11,572 unique protein-coding genes in larva, 12,364 in protonymph, 12,872 in male adult, and 12,617 in female adult stages. Of the 15,842 predicted genes, 10,885 were commonly expressed through all L. pallidum stages. Genes selectively over-transcribed in the larval stage, which is when host parasitization and disease transmission occur, were further annotated, and their putative roles were discussed.  相似文献   

17.
Dicentrarchus labrax is one of the major marine aquaculture species in the European Union. In this study, we have developed a directed-sequencing strategy to sequence three sea bass chromosomes and compared results with other teleosts.Three BAC DNA pools were created from sea bass BAC clones that mapped to stickleback chromosomes/groups V, XVII and XXI. The pools were sequenced to 17-39x coverage by pyrosequencing. Data assembly was supported by Sanger reads and mate pair data and resulted in superscaffolds of 13.2 Mb, 17.5 Mb and 13.7 Mb respectively. Annotation features of the superscaffolds include 1477 genes. We analyzed size change of exon, intron and intergenic sequence between teleost species and deduced a simple model for the evolution of genome composition in teleost lineage.Combination of second generation sequencing technologies, Sanger sequencing and genome partitioning strategies allows “high-quality draft assemblies” of chromosome-sized superscaffolds, which are crucial for the prediction and annotation of complete genes.  相似文献   

18.
As a new developmental vector system, the bacterial artificial chromosome (BAC) has been used widely in constructing genomic libraries and in generating transgenic animals. Isolation of the BAC insert end is useful to analyze the BAC clone. Here, we describe a fast and efficient method to obtain the BAC end by ligating the BAC fragments digested with Not I and another selected restriction enzyme into universal cloning vector, followed by determining the correct clones with HindIII digestion. Further DNA sequencing analysis verified the results mentioned above.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号