首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SUMMARY: We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. MOTIVATION: Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries.  相似文献   

2.
The pooid subfamily of grasses includes some of the most important crop, forage and turf species, such as wheat, barley and Lolium. Developing genomic resources, such as whole-genome physical maps, for analysing the large and complex genomes of these crops and for facilitating biological research in grasses is an important goal in plant biology. We describe a bacterial artificial chromosome (BAC)-based physical map of the wild pooid grass Brachypodium distachyon and integrate this with whole genome shotgun sequence (WGS) assemblies using BAC end sequences (BES). The resulting physical map contains 26 contigs spanning the 272 Mb genome. BES from the physical map were also used to integrate a genetic map. This provides an independent validation and confirmation of the published WGS assembly. Mapped BACs were used in Fluorescence In Situ Hybridisation (FISH) experiments to align the integrated physical map and sequence assemblies to chromosomes with high resolution. The physical, genetic and cytogenetic maps, integrated with whole genome shotgun sequence assemblies, enhance the accuracy and durability of this important genome sequence and will directly facilitate gene isolation.  相似文献   

3.
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules.  相似文献   

4.
MOTIVATION: Since the simultaneous publication of the human genome assembly by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics, several comparisons have been made of various aspects of these two assemblies. In this work, we set out to provide a more comprehensive comparative analysis of the two assemblies and their associated gene sets. RESULTS: The local sequence content for both draft genome assemblies has been similar since the early releases, however it took a year for the quality of the Celera assembly to approach that of HGSC, suggesting an advantage of HGSC's hierarchical shotgun (HS) sequencing strategy over Celera's whole genome shotgun (WGS) approach. While similar numbers of ab initio predicted genes can be derived from both assemblies, Celera's Otto approach consistently generated larger, more varied gene sets than the Ensembl gene build system. The presence of a non-overlapping gene set has persisted with successive data releases from both groups. Since most of the unique genes from either genome assembly could be mapped back to the other assembly, we conclude that the gene set discrepancies do not reflect differences in local sequence content but rather in the assemblies and especially the different gene-prediction methodologies.  相似文献   

5.
6.
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology.  相似文献   

7.
As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 ± 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa , version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.  相似文献   

8.
A physical mapping strategy has been developed to verify and accelerate the assembly and gap closure phase of a microbial genome shotgun-sequencing project. The protocol was worked out during the ongoing Pseudomonas putida KT2440 genome project. A macro-restriction map was constructed by linking probe hybridisation of SwaI- or I-CeuI-restricted chromosomes to serve as a backbone for the quick quality control of sequence and contig assemblies. The library of PCR-generated SwaI linking probes was derived from the sequence assembly after 3- and 6-fold genome coverage. In order to support gap closure in regions with ambiguous assemblies such as the repetitive sequence of the seven ribosomal operons, high-resolution Smith/Birnstiel maps were generated by Southern hybridisation of pulsed-field gel electrophoresis-separated rare-cutter complete/frequent-cutter partial digestions with rare-cutter fragment end probes. Overall 1.5 Mb of the 6.1 Mb P.putida KT2440 genome has been subjected to high-resolution physical mapping in order to align assemblies generated from shotgun sequencing.  相似文献   

9.
The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references.  相似文献   

10.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

11.
Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome''s annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12–20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6–15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly.  相似文献   

12.
The development of ultra-dense genetic maps has the potential to facilitate detailed comparative genomic analyses and whole genome sequence assemblies. Here we describe the use of a custom Affymetrix GeneChip containing nearly 2.4 million features (25 bp sequences) targeting 86,023 unigenes from sunflower (Helianthus annuus L.) and related species to test for single-feature polymorphisms (SFPs) in a recombinant inbred line (RIL) mapping population derived from a cross between confectionery and oilseed sunflower lines (RHA280×RHA801). We then employed an existing genetic map derived from this same population to rigorously filter out low quality data and place 67,486 features corresponding to 22,481 unigenes on the sunflower genetic map. The resulting map contains a substantial fraction of all sunflower genes and will thus facilitate a number of downstream applications, including genome assembly and the identification of candidate genes underlying QTL or traits of interest.  相似文献   

13.
Novel sequences are DNA sequences present in an individual''s genome but absent in the human reference assembly. They are predicted to be biologically important, both individual and population specific, and consistent with the known human migration paths. Recent works have shown that an average person harbors 2–5 Mb of such sequences and estimated that the human pan-genome contains as high as 19–40 Mb of novel sequences. To identify them in a de novo genome assembly, some existing sequence aligners have been used but no computational method has been specifically proposed for this task. In this work, we developed NSIT (Novel Sequence Identification Tool), a software that can accurately and efficiently identify novel sequences in an individual''s de novo whole genome assembly. We identified and characterized 1.1 Mb, 1.2 Mb, and 1.0 Mb of novel sequences in NA18507 (African), YH (Asian), and NA12878 (European) de novo genome assemblies, respectively. Our results show very high concordance with the previous work using the respective reference assembly. In addition, our results using the latest human reference assembly suggest that the amount of novel sequences per individual may not be as high as previously reported. We additionally developed a graphical viewer for comparisons of novel sequence contents. The viewer also helped in identifying sequence contamination; we found 130 kb of Epstein-Barr virus sequence in the previously published NA18507 novel sequences as well as 287 kb of zebrafish repeats in NA12878 de novo assembly. NSIT requires 2GB of RAM and 1.5–2 hrs on a commodity desktop. The program is applicable to input assemblies with varying contig/scaffold sizes, ranging from 100 bp to as high as 50 Mb. It works in both 32-bit and 64-bit systems and outperforms, by large margins, other fast sequence aligners previously applied to this task. To our knowledge, NSIT is the first software designed specifically for novel sequence identification in a de novo human genome assembly.  相似文献   

14.
Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.  相似文献   

15.
The current generation of genome assembly programs uses distance and orientation relationships of paired end reads of clones (mate pairs) to order and orient contigs. Mate pair data can also be used to evaluate and compare assemblies after the fact. Earlier work employed a simple heuristic to detect assembly problems by scanning across an assembly to locate peak concentrations of unsatisfied mate pairs. TAMPA is a novel, computational geometry-based approach to detecting assembly breakpoints by exploiting constraints that mate pairs impose on each other. The method can be used to improve assemblies and determine which of two assemblies is correct in the case of sequence disagreement. Results from several human genome assemblies are presented.  相似文献   

16.
Yak is an important livestock animal for the people indigenous to the harsh, oxygen‐limited Qinghai‐Tibetan Plateau and Hindu Kush ranges of the Himalayas. The yak genome was sequenced in 2012, but its assembly was fragmented because of the inherent limitations of the Illumina sequencing technology used to analyse it. An accurate and complete reference genome is essential for the study of genetic variations in this species. Long‐read sequences are more complete than their short‐read counterparts and have been successfully applied towards high‐quality genome assembly for various species. In this study, we present a high‐quality chromosome‐scale yak genome assembly (BosGru_PB_v1.0) constructed with long‐read sequencing and chromatin interaction technologies. Compared to an existing yak genome assembly (BosGru_v2.0), BosGru_PB_v1.0 shows substantially improved chromosome sequence continuity, reduced repetitive structure ambiguity, and gene model completeness. To characterize genetic variation in yak, we generated de novo genome assemblies based on Illumina short reads for seven recognized domestic yak breeds in Tibet and Sichuan and one wild yak from Hoh Xil. We compared these eight assemblies to the BosGru_PB_v1.0 genome, obtained a comprehensive map of yak genetic diversity at the whole‐genome level, and identified several protein‐coding genes absent from the BosGru_PB_v1.0 assembly. Despite the genetic bottleneck experienced by wild yak, their diversity was nonetheless higher than that of domestic yak. Here, we identified breed‐specific sequences and genes by whole‐genome alignment, which may facilitate yak breed identification.  相似文献   

17.
Here, we report a new multi-optical maps scaffolder (MOMS) aiming at utilizing complementary information among optical maps labelled by distinct enzymes. This pipeline was designed for data structure organization, scaffolding by path traversal, gap-filling and molecule reuse of optical maps. Our testing showed that this pipeline has uncapped enzyme tolerance in scaffolding. This means that there are no inbuilt limits as to the number of maps generated by different enzymes that can be utilized by MOMS. For the genome assembly of the human GM12878 cell line, MOMS significantly improved the contiguity and completeness with an up to 144-fold increase of scaffold N50 compared with initial assemblies. Benchmarking on the genomes of human and O. sativa showed that MOMS is more effective and robust compared with other optical-map-based scaffolders. We believe this pipeline will contribute to high-fidelity chromosome assembly and chromosome-level evolutionary analysis.  相似文献   

18.
A comprehensive second-generation whole genome radiation hybrid (RH II), cytogenetic and comparative map of the horse genome (2n = 64) has been developed using the 5000rad horse x hamster radiation hybrid panel and fluorescence in situ hybridization (FISH). The map contains 4,103 markers (3,816 RH; 1,144 FISH) assigned to all 31 pairs of autosomes and the X chromosome. The RH maps of individual chromosomes are anchored and oriented using 857 cytogenetic markers. The overall resolution of the map is one marker per 775 kilobase pairs (kb), which represents a more than five-fold improvement over the first-generation map. The RH II incorporates 920 markers shared jointly with the two recently reported meiotic maps. Consequently the two maps were aligned with the RH II maps of individual autosomes and the X chromosome. Additionally, a comparative map of the horse genome was generated by connecting 1,904 loci on the horse map with genome sequences available for eight diverse vertebrates to highlight regions of evolutionarily conserved syntenies, linkages, and chromosomal breakpoints. The integrated map thus obtained presents the most comprehensive information on the physical and comparative organization of the equine genome and will assist future assemblies of whole genome BAC fingerprint maps and the genome sequence. It will also serve as a tool to identify genes governing health, disease and performance traits in horses and assist us in understanding the evolution of the equine genome in relation to other species.  相似文献   

19.
Whole-Genome Shotgun Optical Mapping of Rhodospirillum rubrum   总被引:1,自引:0,他引:1  
Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a “molecular cytogenetics” approach to solving problems in genomic analysis.  相似文献   

20.
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron''s Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号