首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood.

Results

We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results.

Conclusions

In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users.  相似文献   

2.
3.
4.
5.
《Genome biology》2014,15(3):R59

Background

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

Results

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

Conclusions

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.  相似文献   

6.

Background

Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved.

Methodology/Principal Findings

We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

Conclusions/Significance

This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing.  相似文献   

7.
8.

Background

Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.

Principal Findings

We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.

Conclusions

These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.  相似文献   

9.
10.

Background

Tree-killing bark beetles (Coleoptera, Scolytinae) are among the most economically and ecologically important forest pests in the northern hemisphere. Induction of terpenoid-based oleoresin has long been considered important in conifer defense against bark beetles, but it has been difficult to demonstrate a direct correlation between terpene levels and resistance to bark beetle colonization.

Methods

To test for inhibitory effects of induced terpenes on colonization by the spruce bark beetle Ips typographus (L.) we inoculated 20 mature Norway spruce Picea abies (L.) Karsten trees with a virulent fungus associated with the beetle, Ceratocystis polonica (Siem.) C. Moreau, and investigated induced terpene levels and beetle colonization in the bark.

Results

Fungal inoculation induced very strong and highly variable terpene accumulation 35 days after inoculation. Trees with high induced terpene levels (n = 7) had only 4.9% as many beetle attacks (5.1 vs. 103.5 attacks m−2) and 2.6% as much gallery length (0.029 m m−2 vs. 1.11 m m−2) as trees with low terpene levels (n = 6). There was a highly significant rank correlation between terpene levels at day 35 and beetle colonization in individual trees. The relationship between induced terpene levels and beetle colonization was not linear but thresholded: above a low threshold concentration of ∼100 mg terpene g−1 dry phloem trees suffered only moderate beetle colonization, and above a high threshold of ∼200 mg terpene g−1 dry phloem trees were virtually unattacked.

Conclusion/Significance

This is the first study demonstrating a dose-dependent relationship between induced terpenes and tree resistance to bark beetle colonization under field conditions, indicating that terpene induction may be instrumental in tree resistance. This knowledge could be useful for developing management strategies that decrease the impact of tree-killing bark beetles.  相似文献   

11.

Background

DNA sequencing techniques used to estimate biodiversity, such as DNA barcoding, may reveal cryptic species. However, disagreements between barcoding and morphological data have already led to controversy. Species delimitation should therefore not be based on mtDNA alone. Here, we explore the use of nDNA and bioclimatic modelling in a new species of aquatic beetle revealed by mtDNA sequence data.

Methodology/Principal Findings

The aquatic beetle fauna of Australia is characterised by high degrees of endemism, including local radiations such as the genus Antiporus. Antiporus femoralis was previously considered to exist in two disjunct, but morphologically indistinguishable populations in south-western and south-eastern Australia. We constructed a phylogeny of Antiporus and detected a deep split between these populations. Diagnostic characters from the highly variable nuclear protein encoding arginine kinase gene confirmed the presence of two isolated populations. We then used ecological niche modelling to examine the climatic niche characteristics of the two populations. All results support the status of the two populations as distinct species. We describe the south-western species as Antiporus occidentalis sp.n.

Conclusion/Significance

In addition to nDNA sequence data and extended use of mitochondrial sequences, ecological niche modelling has great potential for delineating morphologically cryptic species.  相似文献   

12.

Background

The adzuki bean weevil, Callosobruchus chinensis L., is one of the most destructive pests of stored legume seeds such as mungbean, cowpea, and adzuki bean, which usually cause considerable loss in the quantity and quality of stored seeds during transportation and storage. However, a lack of genetic information of this pest results in a series of genetic questions remain largely unknown, including population genetic structure, kinship, biotype abundance, and so on. Co-dominant microsatellite markers offer a great resolving power to determine these events. Here, we report rapid microsatellite isolation from C. chinensis via high-throughput sequencing.

Principal Findings

In this study, 94,560,852 quality-filtered and trimmed reads were obtained for the assembly of genome using Illumina paired-end sequencing technology. In total, the genome with total length of 497,124,785 bp, comprising 403,113 high quality contigs was generated with de novo assembly. More than 6800 SSR loci were detected and a suit of 6303 primer pair sequences were designed and 500 of them were randomly selected for validation. Of these, 196 pair of primers, i.e. 39.2%, produced reproducible amplicons that were polymorphic among 8 C. chinensis genotypes collected from different geographical regions. Twenty out of 196 polymorphic SSR markers were used to analyze the genetic diversity of 18 C. chinensis populations. The results showed the twenty SSR loci were highly polymorphic among these populations.

Conclusions

This study presents a first report of genome sequencing and de novo assembly for C. chinensis and demonstrates the feasibility of generating a large scale of sequence information and SSR loci isolation by Illumina paired-end sequencing. Our results provide a valuable resource for C. chinensis research. These novel markers are valuable for future genetic mapping, trait association, genetic structure and kinship among C. chinensis.  相似文献   

13.

Context

Due to a long history of intensive forest exploitation, few European beech (Fagus sylvatica L.) old-growth forests have been preserved in Europe.

Material and Methods

We studied two beech forest reserves in southern Slovenia. We examined the structural characteristics of the two forest reserves based on data from sample plots and complete inventory obtained from four previous forest management plans. To gain a better understanding of disturbance dynamics, we used aerial imagery to study the characteristics of canopy gaps over an 11-year period in the Kopa forest reserve and a 20-year period in the Gorjanci forest reserve.

Results

The results suggest that these forests are structurally heterogeneous over small spatial scales. Gap size analysis showed that gaps smaller than 500 m2 are the dominant driving force of stand development. The percentage of forest area in canopy gaps ranged from 3.2 to 4.5% in the Kopa forest reserve and from 9.1 to 10.6% in the Gorjanci forest reserve. These forests exhibit relatively high annual rates of coverage by newly established (0.15 and 0.25%) and closed (0.08 and 0.16%) canopy gaps. New gap formation is dependant on senescent trees located throughout the reserve.

Conclusion

We conclude that these stands are not even-sized, but rather unevenly structured. This is due to the fact that the disturbance regime is characterized by low intensity, small-scale disturbances.  相似文献   

14.

Background

The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations.

Results

We produced a draft de novo genome assembly of Australia’s major tephritid pest species, Bactrocera tryoni. The male genome (650 -700 Mbp) includes approximately 150Mb of interspersed repetitive DNA sequences and 60Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions.

Conclusions

This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA families. These genetic resources will facilitate the further investigations of genetic mechanisms responsible for the behavioural and morphological differences between these three species and other tephritids. We have also shown how whole genome sequence data can be used to generate simple diagnostic tests between very closely-related species where only one of the species is scaffolded.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1153) contains supplementary material, which is available to authorized users.  相似文献   

15.
16.
17.

Background

The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.

Results

We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.

Conclusions

In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.  相似文献   

18.

Background

Forecasting the effects of global changes on high altitude ecosystems requires an understanding of the long-term relationships between biota and forcing factors to identify resilience thresholds. Fire is a crucial forcing factor: both fuel build-up from land-abandonment in European mountains, and more droughts linked to global warming are likely to increase fire risks.

Methods

To assess the vegetation response to fire on a millennium time-scale, we analyzed evidence of stand-to-local vegetation dynamics derived from sedimentary plant macroremains from two subalpine lakes. Paleobotanical reconstructions at high temporal resolution, together with a fire frequency reconstruction inferred from sedimentary charcoal, were analyzed by Superposed Epoch Analysis to model plant behavior before, during and after fire events.

Principal Findings

We show that fuel build-up from arolla pine (Pinus cembra) always precedes fires, which is immediately followed by a rapid increase of birch (Betula sp.), then by ericaceous species after 25–75 years, and by herbs after 50–100 years. European larch (Larix decidua), which is the natural co-dominant species of subalpine forests with Pinus cembra, is not sensitive to fire, while the abundance of Pinus cembra is altered within a 150-year period after fires. A long-term trend in vegetation dynamics is apparent, wherein species that abound later in succession are the functional drivers, loading the environment with fuel for fires. This system can only be functional if fires are mainly driven by external factors (e.g. climate), with the mean interval between fires being longer than the minimum time required to reach the late successional stage, here 150 years.

Conclusion

Current global warming conditions which increase drought occurrences, combined with the abandonment of land in European mountain areas, creates ideal ecological conditions for the ignition and the spread of fire. A fire return interval of less than 150 years would threaten the dominant species and might override the resilience of subalpine forests.  相似文献   

19.
20.

Background

Many theoretical researches predicted that the larch species would decrease drastically in China under future climatic changes. However, responses of the structural and compositional changes of Gmelin larch (Larix gmelinii var. gmelinii) forests to climatic changes have rarely been reported.

Methodology/Principal Findings

Field survey was conducted to examine the structures and compositions of natural Gmelin larch forests along a climatic gradient. Stepwise linear regression analyses incorporating linear and quadratic components of climatic and non-climatic factors were performed on the structural and compositional attributes of those natural Gmelin larch forests. Isothermality, Max Temperature of Warmest Month (TempWarmestMonth), Precipitation of Wettest Month (PrecipWettestMonth), Precipitation Seasonality (PrecipSeasonality) and Precipitation of Driest Quarter (PrecipDriestQuarter) were observed to be effective climatic factors in controlling structure and composition of Gmelin larch forests. Isothermality significantly affected total basal area of larch, while TempWarmestMonth, PrecipWettestMonth and PrecipSeasonality significantly affected total basal area of Mongolian pine, and PrecipDriestQuarter significantly affected mean DBH of larch, stand density of larch and total basal area of spruce and fir.

Conclusions/Significance

The summer and winter temperatures and precipitations are all predicted to increase in future in Northeast China. Our results showed the increase of total basal area of spruce and fir, the suppression of regeneration and the decrease of stand density of larch under increased winter precipitation, and the decrease of total basal area of larch under increased summer temperature in the region of current Gmelin larch forest. Therefore, we suggest that larch would decrease and spruce and fir would increase in the region of future Gmelin larch forest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号