首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
One of the most complex and computationally intensive tasks of genome sequence analysis is genome assembly. Even today, few centres have the resources, in both software and hardware, to assemble a genome from the thousands or millions of individual sequences generated in a whole-genome shotgun sequencing project. With the rapid growth in the number of sequenced genomes has come an increase in the number of organisms for which two or more closely related species have been sequenced. This has created the possibility of building a comparative genome assembly algorithm, which can assemble a newly sequenced genome by mapping it onto a reference genome. We describe here a novel algorithm for comparative genome assembly that can accurately assemble a typical bacterial genome in less than four minutes on a standard desktop computer. The software is available as part of the open-source AMOS project.  相似文献   

2.
《Genomics》2022,114(5):110441
Chloridea subflexa and Chloridea virescens are a pair of closely related noctuid species exhibiting pheromone-based sexual isolation and divergent host plant preferences. We produced a novel Illumina short read C. subflexa genome assembly and an improved C. virescens genome assembly, which offer opportunities to study the genomic basis for evolutionarily important traits in this lepidopteran family with few genomic resources. We then examined the feasibility of reference-assisted assembly, an approach that leverages existing high quality genomic resources for genome improvement in closely related taxa and applied it to our Heliothine genomes. Our work demonstrates that reference-assisted assembly has the potential to enhance contiguity and completeness of existing insect genomic resources with minimal additional laboratory costs. We conclude by discussing both the potential and pitfalls of reference-assisted assembly according to the intended downstream assembly application.  相似文献   

3.
We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at .  相似文献   

4.
We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.  相似文献   

5.
A case study in genome-level fragment assembly   总被引:2,自引:0,他引:2  
MOTIVATION: We use the fact of two teams independently sequencing the one megabase genome of Borrelia burgdorferi as an opportunity to study the accuracy of genome-level assembly. RESULTS: We compare the results of three different assembly programs (PHRAP, TIGR Assembler, and STROLL) on the DNA fragments used in both the Brookhaven and TIGR sequencing projects. We also describe the algorithms and data structures used in our assembly program STROLL, which was used in the Brookhaven Borrelia project.  相似文献   

6.
Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project  相似文献   

7.
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.  相似文献   

8.
Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot‐gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re‐scaffolding and gap‐filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as ”high confidence regions“ which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high‐quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines.  相似文献   

9.
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.  相似文献   

10.
《Trends in genetics : TIG》2023,39(3):175-186
Quality control is essential for genome assemblies; however, a consensus has yet to be reached on what metrics should be adopted for the evaluation of assembly quality. N50 is widely used for contiguity measurement, but its effectiveness is constantly in question. Prevailing metrics for the completeness evaluation focus on gene space, yet challenging areas such as tandem repeats are commonly overlooked. Achieving correctness has become an indispensable dimension for quality control, while prevailing assembly releases lack scores reflecting this aspect. We propose a metric set with a set of statistic indexes for effective, comprehensive evaluation of assemblies and provide a score of a finished assembly for each metric, which can be utilized as a benchmark for achieving high-quality genome assemblies.  相似文献   

11.
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.  相似文献   

12.
A whole-genome assembly of the domestic cow, Bos taurus   总被引:4,自引:0,他引:4  

Background

The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods.

Results

We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions.

Conclusions

By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.  相似文献   

13.
《Genome biology》2014,15(3):R59

Background

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

Results

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

Conclusions

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.  相似文献   

14.
转录本组装是基于第二代测序技术研究转录组的关键环节,其质量好坏直接影响到下游结果的可靠性,也是目前的研究热点与难点。转录本组装方法可以分为Genome-guided和de novo两类,它们在理论基础与算法实现方面各有优劣。转录本组装质量的高低依赖于PCR扩增错误率、第二代测序技术准确率、组装算法和参考基因组完整性等方面,而现有的算法还无法完全处理由这些因素带来的影响。本文从转录本组装方法与软件、影响组装质量的因素和对组装质量的评价指标等方面进行讨论,以期能指导纯生物学家对分析软件的选择。  相似文献   

15.
《Epigenetics》2013,8(10):1329-1338
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.  相似文献   

16.
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.  相似文献   

17.
Next-generation transcriptome assembly   总被引:1,自引:0,他引:1  
  相似文献   

18.
19.
The process of coordinated DNA replication and nucleosome assembly, termed replication-coupled (RC) nucleosome assembly, is important for the maintenance of genome integrity. Loss of genome integrity is linked to aging and cancer. RC nucleosome assembly involves deposition of histone H3–H4 by the histone chaperones CAF-1, Rtt106 and Asf1 onto newly-replicated DNA. Coordinated actions of these three his-tone chaperones are regulated by modifications on the histone proteins. One such modification is histone H3 lysine 56 acetylation (H3K56Ac), a mark of newly-synthesized histone H3 that regulates the interaction between H3–H4 and the histone chaperones CAF-1 and Rtt106 following DNA replication and DNA repair. Recently, we have shown that the lysine acetyltransferase Gcn5 and H3 N-terminal tail lysine acetylation also regulates the interaction between H3–H4 and CAF-1 to promote the deposition of newly-synthesized histones. Genetic studies indicate that Gcn5 and Rtt109, the H3K56Ac lysine acetyltransferase, function in parallel to maintain genome stability. Utilizing synthetic genetic array analysis, we set out to identify additional genes that function in parallel with Gcn5 in response to DNA damage. We summarize here the role of Gcn5 in nucleosome assembly and suggest that Gcn5 impacts genome integrity via multiple mechanisms, including nucleosome assembly.Key words: Gen5, Rtt109, chromatin, nucleosome assembly, genome integrity  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号