首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Genome biology》2014,15(3):R59

Background

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

Results

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

Conclusions

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.  相似文献   

2.
A bacterial artificial chromosome (BAC) library was constructed by cloning HindIII-digested high molecular weight DNA from a gynogenetic channel catfish, Ictalurus punctatus, into the vector pBeloBAC11. Approximately 53 500 clones were arrayed in 384-well plates and stored at -80°C (CCBL1), while clones from a smaller insert size fraction were stored at -80°C without arraying (CCBL2). Pulsed-field gel electrophoresis of 100 clones after NotI digestion revealed an average insert size of 165 kb for CCBL1 and 113 kb for CCBL2. Further characterization of CCBL1 demonstrated that 10% of the clones did not contain an insert. CCBL1 provides a 7.2-fold coverage of the channel catfish haploid genome. PCR-based screening demonstrated that 68 out of 74 unique loci were present in the library. This represents a 92% chance to find a unique sequence. These libraries will be useful for physical mapping of the channel catfish genome, and identification of genes controlling major traits in this economically important species.  相似文献   

3.

Background

Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired-end reads from Ostreococcus tauri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture.

Results

The 3 assemblers used, ABySS, CLCBio and Velvet, produced 95% complete genomes in 1402 to 2080 scaffolds with a very low rate of misassembly. Reciprocally, these assemblies improved the original genome assembly by filling in 930 gaps. Combined with additional analysis of raw reads and PCR sequencing effort, 1194 gaps have been solved in total adding up to 460 kb of sequence. Mapping of RNAseq Illumina data on this updated genome led to a twofold reduction in the proportion of multi-exon protein coding genes, representing 19% of the total 7699 protein coding genes. The comparison of the DNA extracted in 2001 and 2009 revealed the fixation of 8 single nucleotide substitutions and 2 deletions during the approximately 6000 generations in the lab. The deletions either knocked out or truncated two predicted transmembrane proteins, including a glutamate-receptor like gene.

Conclusion

High coverage (>80 fold) paired-end Illumina sequencing enables a high quality 95% complete genome assembly of a compact ~13 Mb haploid eukaryote. This genome sequence has remained stable for 6000 generations of lab culture.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1103) contains supplementary material, which is available to authorized users.  相似文献   

4.
The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high‐throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm.  相似文献   

5.
Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.  相似文献   

6.
The genome sequence of silkworm, Bombyx mori.   总被引:21,自引:0,他引:21  
We performed threefold shotgun sequencing of the silkworm (Bombyx mori) genome to obtain a draft sequence and establish a basic resource for comprehensive genome analysis. By using the newly developed RAMEN assembler, the sequence data derived from whole-genome shotgun (WGS) sequencing were assembled into 49,345 scaffolds that span a total length of 514 Mb including gaps and 387 Mb without gaps. Because the genome size of the silkworm is estimated to be 530 Mb, almost 97% of the genome has been organized in scaffolds, of which 75% has been sequenced. By carrying out a BLAST search for 50 characteristic Bombyx genes and 11,202 non-redundant expressed sequence tags (ESTs) in a Bombyx EST database against the WGS sequence data, we evaluated the validity of the sequence for elucidating the majority of silkworm genes. Analysis of the WGS data revealed that the silkworm genome contains many repetitive sequences with an average length of <500 bp. These repetitive sequences appear to have been derived from truncated transposons, which are interspersed at 2.5- to 3-kb intervals throughout the genome. This pattern suggests that silkworm may have an active mechanism that promotes removal of transposons from the genome. We also found evidence for insertions of mitochondrial DNA fragments at 9 sites. A search for Bombyx orthologs to Drosophila genes controlling sex determination in the WGS data revealed 11 Bombyx genes and suggested that the sex-determining systems differ profoundly between the two species.  相似文献   

7.
Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples.  相似文献   

8.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

9.

Background

Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.

Results

The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.

Conclusions

We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users.  相似文献   

10.
Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13–16. Ten samples with 13 known unique variants (“singleton variants” within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single “control” sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3` read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%±0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; ≥30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing.  相似文献   

11.
Large-insert bacterial artificial chromosome (BAC) libraries are necessary for advanced genetics and genomics research. To facilitate gene cloning and characterization, genome analysis, and physical mapping of scallop, two BAC libraries were constructed from nuclear DNA of Zhikong scallop, Chlamys farreri Jones et Preston. The libraries were constructed in the BamHI and MboI sites of the vector pECBAC1, respectively. The BamHI library consists of 73,728 clones, and approximately 99% of the clones contain scallop nuclear DNA inserts with an average size of 110 kb, covering 8.0x haploid genome equivalents. Similarly, the MboI library consists of 7680 clones, with an average insert of 145 kb and no insert-empty clones, thus providing a genome coverage of 1.1x. The combined libraries collectively contain a total of 81,408 BAC clones arrayed in 212 384-well microtiter plates, representing 9.1x haploid genome equivalents and having a probability of greater than 99% of discovering at least one positive clone with a single-copy sequence. High-density clone filters prepared from a subset of the two libraries were screened with nine pairs of Overgos designed from the cDNA or DNA sequences of six genes involved in the innate immune system of mollusks. Positive clones were identified for every gene, with an average of 5.3 BAC clones per gene probe. These results suggest that the two scallop BAC libraries provide useful tools for gene cloning, genome physical mapping, and large-scale sequencing in the species.  相似文献   

12.
Molecular identification of mixed‐species pollen samples has a range of applications in various fields of research. To date, such molecular identification has primarily been carried out via amplicon sequencing, but whole‐genome shotgun (WGS) sequencing of pollen DNA has potential advantages, including (1) more genetic information per sample and (2) the potential for better quantitative matching. In this study, we tested the performance of WGS sequencing methodology and publicly available reference sequences in identifying species and quantifying their relative abundance in pollen mock communities. Using mock communities previously analyzed with DNA metabarcoding, we sequenced approximately 200Mbp for each sample using Illumina HiSeq and MiSeq. Taxonomic identifications were based on the Kraken k‐mer identification method with reference libraries constructed from full‐genome and short read archive data from the NCBI database. We found WGS to be a reliable method for taxonomic identification of pollen with near 100% identification of species in mixtures but generating higher rates of false positives (reads not identified to the correct taxon at the required taxonomic level) relative to rbcL and ITS2 amplicon sequencing. For quantification of relative species abundance, WGS data provided a stronger correlation between pollen grain proportion and sequence read proportion, but diverged more from a 1:1 relationship, likely due to the higher rate of false positives. Currently, a limitation of WGS‐based pollen identification is the lack of representation of plant diversity in publicly available genome databases. As databases improve and costs drop, we expect that eventually genomics methods will become the methods of choice for species identification and quantification of mixed‐species pollen samples.  相似文献   

13.
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone.  相似文献   

14.
Field pennycress (Thlaspi arvense L.) is being domesticated as a new winter cover crop and biofuel species for the Midwestern United States that can be double-cropped between corn and soybeans. A genome sequence will enable the use of new technologies to make improvements in pennycress. To generate a draft genome, a hybrid sequencing approach was used to generate 47 Gb of DNA sequencing reads from both the Illumina and PacBio platforms. These reads were used to assemble 6,768 genomic scaffolds. The draft genome was annotated using the MAKER pipeline, which identified 27,390 predicted protein-coding genes, with almost all of these predicted peptides having significant sequence similarity to Arabidopsis proteins. A comprehensive analysis of pennycress gene homologues involved in glucosinolate biosynthesis, metabolism, and transport pathways revealed high sequence conservation compared with other Brassicaceae species, and helps validate the assembly of the pennycress gene space in this draft genome. Additional comparative genomic analyses indicate that the knowledge gained from years of basic Brassicaceae research will serve as a powerful tool for identifying gene targets whose manipulation can be predicted to result in improvements for pennycress.  相似文献   

15.
The parasitic nematode, Brugia malayi, causes lymphatic filariasis in humans, which in severe cases leads to the condition known as elephantiasis. The parasite contains an endosymbiotic alpha-proteobacterium of the genus Wolbachia that is required for normal worm development and fecundity and is also implicated in the pathology associated with infections by these filarial nematodes. Bacterial artificial chromosome libraries were constructed from B. malayi DNA and provide over 11-fold coverage of the nematode genome. Wolbachia genomic fragments were simultaneously cloned into the libraries giving over 5-fold coverage of the 1.1 Mb bacterial genome. A physical framework for the Wolbachia genome was developed by construction of a plasmid library enriched for Wolbachia DNA as a source of sequences to hybridise to high-density bacterial artificial chromosome colony filters. Bacterial artificial chromosome end sequencing provided additional Wolbachia probe sequences to facilitate assembly of a contig that spanned the entire genome. The Wolbachia sequences provided a marker approximately every 10 kb. Four rare-cutting restriction endonucleases were used to restriction map the genome to a resolution of approximately 60 kb and demonstrate concordance between the bacterial artificial chromosome clones and native Wolbachia genomic DNA. Comparison of Wolbachia sequences to public databases using BLAST algorithms under stringent conditions allowed confident prediction of 69 Wolbachia peptide functions and two rRNA genes. Comparison to closely related complete genomes revealed that while most sequences had orthologs in the genome of the Wolbachia endosymbiont from Drosophila melanogaster, there was no evidence for long-range synteny. Rather, there were a few cases of short-range conservation of gene order extending over regions of less than 10 kb. The molecular scaffold produced for the genome of the Wolbachia from B. malayi forms the basis of a genomic sequencing effort for this bacterium, circumventing the difficult challenge of purifying sufficient endosymbiont DNA from a tropical parasite for a whole genome shotgun sequencing strategy.  相似文献   

16.
17.
Radish (Raphanus sativus L.) is an edible root vegetable crop that is cultivated worldwide and whose genome has been sequenced. Here we report the complete nucleotide sequence of the radish cultivar WK10039 chloroplast (cp) genome, along with a de novo assembly strategy using whole genome shotgun sequence reads obtained by next generation sequencing. The radish cp genome is 153,368 bp in length and has a typical quadripartite structure, composed of a pair of inverted repeat regions (26,217 bp each), a large single copy region (83,170 bp), and a small single copy region (17,764 bp). The radish cp genome contains 87 predicted protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence analysis revealed the presence of 91 simple sequence repeats (SSRs) in the radish cp genome.  相似文献   

18.
微生物全基因组鸟枪法测序   总被引:4,自引:0,他引:4  
罗春清  杨焕明 《遗传》2002,24(3):310-314
全基因组测序主要有二种策略,一种是分级鸟枪法测序,另一种是全基因组鸟枪法测序。微生物是一种十分重要的遗传资源,运用全基因组鸟枪法可以方便、快捷地完成其基因组的测序任务。本文对微生物全基因组鸟枪法测序中文库构建、插入片段的长短比例、反应投入量、拼接以及补洞等问题作了较细致的描述,有些步骤作了举例说明。 Abstract:Two strategies introduced for whole genome sequencing,one is clone by clone method,the other is whole genome shotgun sequencing,for microbes which are very important to us,whole genome shotgun sequencing method is very convenient.In this article we discussed the library construction、long-to-short-ratio of insert,、total number of reads should be sequenced、assembly and gap filling technologies of the whole microbial genome shotgun sequencing method while some examples presented.  相似文献   

19.
Rubrobacter radiotolerans strain RSPS-4 is a slightly thermophilic member of the phylum “Actinobacteria” isolated from a hot spring in São Pedro do Sul, Portugal. This aerobic and halotolerant bacterium is also extremely resistant to gamma and UV radiation, which are the main reasons for the interest in sequencing its genome. Here, we present the complete genome sequence of strain RSPS-4 as well as its assembly and annotation. We also compare the gene sequence of this organism with that of the type strain of the species R. radiotolerans isolated from a hot spring in Japan. The genome of strain RSPS-4 comprises one circular chromosome of 2,875,491 bp with a G+C content of 66.91%, and 3 circular plasmids of 190,889 bp, 149,806 bp and 51,047 bp, harboring 3,214 predicted protein coding genes, 46 tRNA genes and a single rRNA operon.  相似文献   

20.

Background

Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.

Results

To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.

Conclusion

Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号