首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph, a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at https://github.com/hsnguyen/assembly.  相似文献   

2.
3.
We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences.  相似文献   

4.
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion’s share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.  相似文献   

5.
Despite the power of massively parallel sequencing platforms, a drawback is the short length of the sequence reads produced. We demonstrate that short reads can be locally assembled into longer contigs using paired-end sequencing of restriction-site associated DNA (RAD-PE) fragments. We use this RAD-PE contig approach to identify single nucleotide polymorphisms (SNPs) and determine haplotype structure in threespine stickleback and to sequence E. coli and stickleback genomic DNA with overlapping contigs of several hundred nucleotides. We also demonstrate that adding a circularization step allows the local assembly of contigs up to 5 kilobases (kb) in length. The ease of assembly and accuracy of the individual contigs produced from each RAD site sequence suggests RAD-PE sequencing is a useful way to convert genome-wide short reads into individually-assembled sequences hundreds or thousands of nucleotides long.  相似文献   

6.
7.
《Genomics》2020,112(3):2379-2384
Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies. We generated ~15× coverage Nanopore long reads from ten GridION flowcells and utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions. By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.  相似文献   

8.
The stability of the elements of eleven transposon families (412, B 104, blood, 297, 1731, G, copia, mdg 4, hobo, jockey and I) has been compared by the Southern technique among individuals of a Drosophila line that has been subjected to 30 generations of sister sib matings. The 412, B104, blood, 297, 1731 and G elements appear stable. Heterochromatic copia and hobo elements and euchromatic I elements appear highly polymorphic. In addition, copia, mdg 4, jockey and I elements undergo an instability resulting in significant variations in relative intensity among autoradiographic bands. The extent of the polymorphisms detected strongly suggests de novo rearrangements of transposable elements.  相似文献   

9.
Distribution of transposable elements in prokaryotes   总被引:5,自引:0,他引:5  
We consider models for the distribution of the number of elements per host genome for families of transposable elements (TEs). The hosts are assumed to be prokaryotes. These models assume a constant rate of infection of uninfected hosts by TEs, replicative transposition within each host, and a reduction of the fitness of a host dependent on the number of TEs it contains. No provision was made for the deletion of individual TEs within a host or for recombination, since both are relatively rare events in prokaryotes. These models mostly assume that the TE performs no function for the host, and that the reduction in fitness with increased copy number is due to effects such as the impairment of beneficial genes by transposition or homologous recombination. We also consider a model in which the TEs can convey a selective advantage to the host. The equilibrium distributions of copy number are determined for these models, and are of a variety of classical types. Relevant parameters of the models are estimated using data on the distribution of insertion sequences in natural isolates of Escherichia coli.  相似文献   

10.
Transposable elements constitute 2-5% of the genome content in trypanosomatid parasites. Some of them are involved in critical cellular functions, such as the regulation of gene expression in Leishmania spp. In this review, we highlight the remarkable role extinct transposable elements can play as the source of potential new functions.  相似文献   

11.
12.
Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that the most efficient technology produces the shortest read lengths. Short-read sequencing has been applied successfully to resequence the human genome and those of other species but not to whole-genome sequencing of novel organisms. Here we describe the sequencing and assembly of a novel clinical isolate of Pseudomonas aeruginosa, strain PAb1, using very short read technology. From 8,627,900 reads, each 33 nucleotides in length, we assembled the genome into one scaffold of 76 ordered contiguous sequences containing 6,290,005 nucleotides, including one contig spanning 512,638 nucleotides, plus an additional 436 unordered contigs containing 416,897 nucleotides. Our method includes a novel gene-boosting algorithm that uses amino acid sequences from predicted proteins to build a better assembly. This study demonstrates the feasibility of very short read sequencing for the sequencing of bacterial genomes, particularly those for which a related species has been sequenced previously, and expands the potential application of this new technology to most known prokaryotic species.  相似文献   

13.
《Genomics》2021,113(6):4173-4183
Cherries are stone fruits and belong to the economically important plant family of Rosaceae with worldwide cultivation of different species. The ground cherry, Prunus fruticosa Pall., is an ancestor of cultivated sour cherry, an important tetraploid cherry species. Here, we present a long read chromosome-level draft genome assembly and related plastid sequences using the Oxford Nanopore Technology PromethION platform and R10.3 pore type. We generated a final consensus genome sequence of 366 Mb comprising eight chromosomes. The N50 scaffold was ~44 Mb with the longest chromosome being 66.5 Mb. The chloroplast and mitochondrial genomes were 158,217 bp and 383,281 bp long, which is in accordance with previously published plastid sequences. This is the first report of the genome of ground cherry (P. fruticosa) sequenced by long read technology only. The datasets obtained from this study provide a foundation for future breeding, molecular and evolutionary analysis in Prunus studies.  相似文献   

14.
Transposable elements (TEs) are selfish elements that cause harmful mutations, contribute to the structure of regulatory networks and shape the architecture of genomes. Natural selection against their harmful effects has long been considered the dominant force limiting their spread. It is now clear that a genome defense system of RNA-mediated silencing also plays a crucial role in limiting TE proliferation. A full understanding of TE evolutionary dynamics must consider how these forces jointly determine their proliferation within genomes. Here I consider these forces from two perspectives - dynamics within populations and evolutionary games within the germline. The analysis of TE dynamics from these two perspectives promises to provide new insight into their role in evolution.  相似文献   

15.
Nomenclature of transposable elements in prokaryotes.   总被引:24,自引:0,他引:24  
Transposable elements are defined as specific DNA segments that can repeatedly insert into a few or many sites in a genome. They are classified as simple IS elements, more complex Tn transposons and self-replicating episomes. Definitions and nomenclature rules for these three classes of prokaryotic transposable elements are specified.  相似文献   

16.
We present a global analysis of the distribution of 43 transposable elements (TEs) in 228 species of the Drosophila genus from our data and data from the literature. Data on chromosome localization come from in situ hybridization and presence/absence of the elements from southern analyses. This analysis shows great differences between TE distributions, even among closely related species. Some TEs are distributed according to the phylogeny of their host specie; others do not entirely follow the phylogeny, suggesting horizontal transfers. A higher number of insertion sites for most TEs in the genome of D. melanogaster is observed when compared with that in D. simulans. This suggests either intrinsic differences in genomic characteristics between the two species, or the influence of differing effective population sizes, although biases due to the use of TE probes coming mostly from D. melanogaster and to the way TEs are initially detected in species cannot be ruled out. Data on TEs more specific to the species under consideration are necessary for a better understanding of their distribution in organisms and populations. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

17.
Silencing of transposable elements in plants.   总被引:1,自引:0,他引:1  
  相似文献   

18.
Nomenclature of transposable elements in prokaryotes.   总被引:9,自引:0,他引:9  
Transposable elements are defined as specific DNA segments that can repeatedly insert into a few or many sites in a genome. They are classified as simple IS elements, more complex Tn transposons, and self-replicating episomes. Definitions and nomenclature rules for these three classes of prokaryotic transposable elements are specified.  相似文献   

19.
We have characterised a new family of repetitive sequences that we have named Mrs (maize repetitive sequences). Mrs elements are associated with different maize genes and seem to be specific for the genome of Zea species. Mrs elements are short, AT-rich and contain terminal inverted repeats (TIRs). The sequence of their TIRs, as well as the fact that they are flanked by short repetitions that tend to be TAA, allows us to propose Mrs as a new subfamily of Tourist transposable elements.  相似文献   

20.
Despite the wide distribution of transposable elements (TEs) in mammalian genomes, part of their evolutionary significance remains to be discovered. Today there is a substantial amount of evidence showing that TEs are involved in the generation of new exons in different species. In the present study, we searched 22,805 genes and reported the occurrence of TE-cassettes in coding sequences of 542 cow genes using the RepeatMasker program. Despite the significant number (542) of genes with TE insertions in exons only 14 (2.6%) of them were translated into protein, which we characterized as chimeric genes. From these chimeric genes, only the FAST kinase domains 3 (FASTKD3) gene, present on chromosome BTA 20, is a functional gene and showed evidence of the exaptation event. The genome sequence analysis showed that the last exon coding sequence of bovine FASTKD3 is approximately 85% similar to the ART2A retrotransposon sequence. In addition, comparison among FASTKD3 proteins shows that the last exon is very divergent from those of Homo sapiens, Pan troglodytes and Canis familiares. We suggest that the gene structure of bovine FASTKD3 gene could have originated by several ectopic recombinations between TE copies. Additionally, the absence of TE sequences in all other species analyzed suggests that the TE insertion is clade-specific, mainly in the ruminant lineage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号