期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Plantagora: modeling whole genome sequencing and assembly of plant genomes

Barthelson R McFarlin AJ Rounsley SD Young S 《PloS one》2011,6(12):e28436

Background

Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.

Methodology/Principal Findings

For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.

Conclusions/Significance

Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further. 相似文献

2.

The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation 总被引：1，自引：0，他引：1

Konstantinos Mavromatis Miriam L. Land Thomas S. Brettin Daniel J. Quest Alex Copeland Alicia Clum Lynne Goodwin Tanja Woyke Alla Lapidus Hans Peter Klenk Robert W. Cottingham Nikos C. Kyrpides 《PloS one》2012,7(12)

Background

The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.

Methodology/Principal Findings

In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.

Conclusion

These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio). 相似文献

3.

Display of a Maize cDNA library on baculovirus infected insect cells

Helene Y Meller Harel Veronique Fontaine Hongying Chen Ian M Jones Paul A Millner 《BMC biotechnology》2008,8(1):64

Background

Maize is a good model system for cereal crop genetics and development because of its rich genetic heritage and well-characterized morphology. The sequencing of its genome is well advanced, and new technologies for efficient proteomic analysis are needed. Baculovirus expression systems have been used for the last twenty years to express in insect cells a wide variety of eukaryotic proteins that require complex folding or extensive posttranslational modification. More recently, baculovirus display technologies based on the expression of foreign sequences on the surface of Autographa californica (AcMNPV) have been developed. We investigated the potential of a display methodology for a cDNA library of maize young seedlings. 相似文献

4.

Transcript length bias in RNA-seq data confounds systems biology

Alicia Oshlack Matthew J Wakefield 《Biology direct》2009,4(1):14-10

相似文献

5.

M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species

Todd J Treangen Xavier Messeguer 《BMC bioinformatics》2006,7(1):433

Background

Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. 相似文献

6.

DraGnET: Software for storing,managing and analyzing annotated draft genome sequence data

Stacy Duncan Ruchita Sirkanungo Leslie Miller Gregory J Phillips 《BMC bioinformatics》2010,11(1):100

Background

New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. 相似文献

7.

Data structures and compression algorithms for high-throughput sequencing technologies

Kenny Daily Paul Rigor Scott Christley Xiaohui Xie Pierre Baldi

《BMC bioinformatics》

Background

High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data. 相似文献

8.

An effective approach for identification of <Emphasis Type="Italic">in vivo</Emphasis> protein-DNA binding sites from paired-end ChIP-Seq data

Congmao Wang Jie Xu Dasheng Zhang Zoe A Wilson Dabing Zhang 《BMC bioinformatics》2010,11(1):81

Background

ChIP-Seq, which combines chromatin immunoprecipitation (ChIP) with high-throughput massively parallel sequencing, is increasingly being used for identification of protein-DNA interactions in vivo in the genome. However, to maximize the effectiveness of data analysis of such sequences requires the development of new algorithms that are able to accurately predict DNA-protein binding sites. 相似文献

9.

Patterns of tandem repetition in plant whole genome assemblies

Rafael Navajas-Pérez Andrew H. Paterson 《Molecular genetics and genomics : MGG》2009,281(6):579-590

Tandem repeats often confound large genome assemblies. A survey of tandemly arrayed repetitive sequences was carried out in whole genome sequences of the green alga Chlamydomonas reinhardtii, the moss Physcomitrella patens, the monocots rice and sorghum, and the dicots Arabidopsis thaliana, poplar, grapevine, and papaya, in order to test how these assemblies deal with this fraction of DNA. Our results suggest that plant genome assemblies preferentially include tandem repeats composed of shorter monomeric units (especially dinucleotide and 9–30-bp repeats), while higher repetitive units pose more difficulties to assemble. Nevertheless, notwithstanding that currently available sequencing technologies struggle with higher arrays of repeated DNA, major well-known repetitive elements including centromeric and telomeric repeats as well as high copy-number genes, were found to be reasonably well represented. A database including all tandem repeat sequences characterized here was created to benefit future comparative genomic analyses. 相似文献

10.

Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags

Lingfei Shangguan Jian Han Emrul Kayesh Xin Sun Changqing Zhang Tariq Pervaiz Xicheng Wen Jinggui Fang 《PloS one》2013,8(7)

Background

With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated.

Methodology/Principal Finding

Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ∼ 98.28% and 89.02% ∼ 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly.

Conclusion

The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published. 相似文献

11.

Long‐read sequence capture of the haemoglobin gene clusters across codfish species

Siv Nam Khang Hoff Helle T. Baalsrud Ave Tooming‐Klunderud Morten Skage Todd Richmond Gregor Obernosterer Reza Shirzadi Ole Kristian Trresen Kjetill S. Jakobsen Sissel Jentoft 《Molecular ecology resources》2019,19(1):245-259

Combining high‐throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short‐read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long‐read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom‐made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage‐specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long‐read capture as a versatile approach for comparative genomic studies by generation of a cross‐species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes. 相似文献

12.

Calling SNPs without a reference sequence

Aakrosh Ratan Yu Zhang Vanessa M Hayes Stephan C Schuster Webb Miller 《BMC bioinformatics》2010,11(1):130

Background

The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have been reported. Much less has been reported on how to use these technologies to determine genetic differences among individuals of a species for which a reference sequence is not available, which drastically limits the number of species that can easily benefit from these new technologies. 相似文献

13.

Automated genome mining for natural products

Michael HT Li Peter MU Ung James Zajkowski Sylvie Garneau-Tsodikova David H Sherman 《BMC bioinformatics》2009,10(1):185

Background

Discovery of new medicinal agents from natural sources has largely been an adventitious process based on screening of plant and microbial extracts combined with bioassay-guided identification and natural product structure elucidation. Increasingly rapid and more cost-effective genome sequencing technologies coupled with advanced computational power have converged to transform this trend toward a more rational and predictive pursuit. 相似文献

14.

High-throughput sequence alignment using Graphics Processing Units 总被引：1，自引：0，他引：1

Michael C Schatz Cole Trapnell Arthur L Delcher Amitabh Varshney 《BMC bioinformatics》2007,8(1):474

Background

The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. 相似文献

15.

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (<Emphasis Type="Italic">Microcebus murinus</Emphasis>)

Peter A. Larsen R. Alan Harris Yue Liu Shwetha C. Murali C. Ryan Campbell Adam D. Brown Beth A. Sullivan Jennifer Shelton Susan J. Brown Muthuswamy Raveendran Olga Dudchenko Ido Machol Neva C. Durand Muhammad S. Shamim Erez Lieberman Aiden Donna M. Muzny Richard A. Gibbs Anne D. Yoder Jeffrey Rogers Kim C. Worley 《BMC biology》2017,15(1):110

相似文献

16.

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing

Tobias Wittkop Jan Baumbach Francisco P Lobo Sven Rahmann 《BMC bioinformatics》2007,8(1):396

Background

Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. 相似文献

17.

BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes

下载免费PDF全文

Helena Staňková Alex R. Hastie Saki Chan Jan Vrána Zuzana Tulpová Marie Kubaláková Paul Visendi Satomi Hayashi Mingcheng Luo Jacqueline Batley David Edwards Jaroslav Doležel Hana Šimková 《Plant biotechnology journal》2016,14(7):1523-1531

The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. 相似文献

18.

A chromosomal genomics approach to assess and validate the desi and kabuli draft chickpea genome assemblies

Pradeep Ruperao Chon‐Kit Kenneth Chan Sarwar Azam Miroslava Karafiátová Satomi Hayashi Jana Čížková Rachit K. Saxena Hana Šimková Chi Song Jan Vrána Annapurna Chitikineni Paul Visendi Pooran M. Gaur Teresa Millán Karam B. Singh Bunyamin Taran Jun Wang Jacqueline Batley Jaroslav Doležel Rajeev K. Varshney David Edwards 《Plant biotechnology journal》2014,12(6):778-786

With the expansion of next‐generation sequencing technology and advanced bioinformatics, there has been a rapid growth of genome sequencing projects. However, while this technology enables the rapid and cost‐effective assembly of draft genomes, the quality of these assemblies usually falls short of gold standard genome assemblies produced using the more traditional BAC by BAC and Sanger sequencing approaches. Assembly validation is often performed by the physical anchoring of genetically mapped markers, but this is prone to errors and the resolution is usually low, especially towards centromeric regions where recombination is limited. New approaches are required to validate reference genome assemblies. The ability to isolate individual chromosomes combined with next‐generation sequencing permits the validation of genome assemblies at the chromosome level. We demonstrate this approach by the assessment of the recently published chickpea kabuli and desi genomes. While previous genetic analysis suggests that these genomes should be very similar, a comparison of their chromosome sizes and published assemblies highlights significant differences. Our chromosomal genomics analysis highlights short defined regions that appear to have been misassembled in the kabuli genome and identifies large‐scale misassembly in the draft desi genome. The integration of chromosomal genomics tools within genome sequencing projects has the potential to significantly improve the construction and validation of genome assemblies. The approach could be applied both for new genome assemblies as well as published assemblies, and complements currently applied genome assembly strategies. 相似文献

19.

SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes

Nunzio D'Agostino Alessandra Traini Luigi Frusciante Maria Luisa Chiusano 《BMC plant biology》2009,9(1):142-16

相似文献

20.

Chromosome‐level hybrid de novo genome assemblies as an attainable option for nonmodel insects

Coline C. Jaworski Carson W. Allan Luciano M. Matzkin 《Molecular ecology resources》2020,20(5):1277-1293

The emergence of third‐generation sequencing (3GS; long‐reads) is bringing closer the goal of chromosome‐size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of nonmodel organisms. However, long‐read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short‐reads and long‐reads, provide an alternative efficient and cost‐effective approach to generate de novo, chromosome‐level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation are constantly being expanded and improved. This makes it difficult for nonexperts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of nonmodel organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a nonmodel cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this nonmodel organism using the dbg2olc pipeline. 相似文献