首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.  相似文献   

2.
Zoysiagrass (Zoysia spp.), belonging to the genus Zoysia in the subfamily Chloridoideae, is widely used in domestic lawns, sports fields and as forage. We constructed high‐density genetic maps of Zoysia japonica using a restriction site‐associated DNA sequencing (RAD‐Seq) approach and an F1 mapping population derived from a cross between ‘Carrizo’ and ‘El Toro’. Two linkage maps were constructed, one for each of the parents. A map consisting of 2408 RAD markers distributed on 21 linkage groups was constructed for ‘Carrizo’. Another map with 1230 RAD markers mapped on 20 linkage groups was constructed for ‘El Toro’. The average distance between adjacent markers of the two maps was at 0.56 and 1.4 cM, respectively. Comparative genomics analysis was carried out among zoysiagrass, rice and sorghum genomes and a highly conserved collinearity in the gene order was observed among the three genomes. Chromosome collinearity was disrupted at centromeric regions for each chromosome pair between zoysiagrass and sorghum genomes. However, no obvious synteny gaps were observed across the centromeric regions between zoysiagrass and rice genomes. Two homologous chromosomes for each of the 10 sorghum chromosomes were found in the zoysiagrass genome, indicating an allotetraploid origin for zoysiagrass. The reduction of the basic chromosome number from 12 to 10 in chloridoids and panicoids took place via independent single‐step nested chromosome fusion events after the two subfamilies diverged from a common ancestor. The genetic maps will assist in genome sequence assembly, targeted gene isolation and comparative genomic analyses among grasses.  相似文献   

3.
A high utility integrated map of the pig genome   总被引:2,自引:1,他引:1  

Background

The domestic pig is being increasingly exploited as a system for modeling human disease. It also has substantial economic importance for meat-based protein production. Physical clone maps have underpinned large-scale genomic sequencing and enabled focused cloning efforts for many genomes. Comparative genetic maps indicate that there is more structural similarity between pig and human than, for example, mouse and human, and we have used this close relationship between human and pig as a way of facilitating map construction.

Results

Here we report the construction of the most highly continuous bacterial artificial chromosome (BAC) map of any mammalian genome, for the pig (Sus scrofa domestica) genome. The map provides a template for the generation and assembly of high-quality anchored sequence across the genome. The physical map integrates previous landmark maps with restriction fingerprints and BAC end sequences from over 260,000 BACs derived from 4 BAC libraries and takes advantage of alignments to the human genome to improve the continuity and local ordering of the clone contigs. We estimate that over 98% of the euchromatin of the 18 pig autosomes and the X chromosome along with localized coverage on Y is represented in 172 contigs, with chromosome 13 (218 Mb) represented by a single contig. The map is accessible through pre-Ensembl, where links to marker and sequence data can be found.

Conclusion

The map will enable immediate electronic positional cloning of genes, benefiting the pig research community and further facilitating use of the pig as an alternative animal model for human disease. The clone map and BAC end sequence data can also help to support the assembly of maps and genome sequences of other artiodactyls.  相似文献   

4.
Whole-Genome Shotgun Optical Mapping of Rhodospirillum rubrum   总被引:1,自引:0,他引:1  
Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a “molecular cytogenetics” approach to solving problems in genomic analysis.  相似文献   

5.
《Genomics》2022,114(5):110441
Chloridea subflexa and Chloridea virescens are a pair of closely related noctuid species exhibiting pheromone-based sexual isolation and divergent host plant preferences. We produced a novel Illumina short read C. subflexa genome assembly and an improved C. virescens genome assembly, which offer opportunities to study the genomic basis for evolutionarily important traits in this lepidopteran family with few genomic resources. We then examined the feasibility of reference-assisted assembly, an approach that leverages existing high quality genomic resources for genome improvement in closely related taxa and applied it to our Heliothine genomes. Our work demonstrates that reference-assisted assembly has the potential to enhance contiguity and completeness of existing insect genomic resources with minimal additional laboratory costs. We conclude by discussing both the potential and pitfalls of reference-assisted assembly according to the intended downstream assembly application.  相似文献   

6.
Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.  相似文献   

7.
We have designed and implemented a system to manage whole genome shotgun sequences and whole genome sequence assembly data flow. The Sequence Assembly Manager (SAM) consists primarily of a MySQL relational database and Perl applications designed to easily manipulate and coordinate the analysis of sequence information and to view and report genome assembly progress through its Common Gateway Interface (CGI) web interface. The application includes a tool to compare sequence assemblies to fingerprint maps that has been used successfully to improve and validate both maps and sequence assemblies of the Rhodococcus sp.RHAI and Cryptococcus neoformans WM276 genomes.  相似文献   

8.
We have developed a multiplex method of genome analysis, restriction landmark genomic scanning (RLGS) that has been used to construct genetic maps in mice. Restriction landmarks are end-labeled restriction fragments of genomic DNA that are separated by using high resolution, two-dimensional gel electrophoresis identifying as many as two thousand landmark loci in a single gel. Variation for several hundred of these loci has been identified between laboratory strains and between these strains and Mus spretus. The segregation of more than 1100 RLGS loci has been analyxed in recombinant inbred (RI) strains and in two separate interspecific genetic crosses. Genetic maps have been derived that link 1045 RLGS loci to reference loci on all of the autosomes and the X chromosome of the mouse genome. The RLGS method can be applied to genome analysis in many different organisms to identify genomic loci because it used end-labeling of restriction landmarks rather than probe hybridization. Different combinations of restriction enzymes yield different sets of RLGS loci providing expanded power for genetic mapping.  相似文献   

9.
High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5–18.5 Kbp with an extremely low error rate (0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.  相似文献   

10.
As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.  相似文献   

11.
The sequencing of the 12 genomes of members of the genus Drosophila was taken as an opportunity to reevaluate the genetic and physical maps for 11 of the species, in part to aid in the mapping of assembled scaffolds. Here, we present an overview of the importance of cytogenetic maps to Drosophila biology and to the concepts of chromosomal evolution. Physical and genetic markers were used to anchor the genome assembly scaffolds to the polytene chromosomal maps for each species. In addition, a computational approach was used to anchor smaller scaffolds on the basis of the analysis of syntenic blocks. We present the chromosomal map data from each of the 11 sequenced non-Drosophila melanogaster species as a series of sections. Each section reviews the history of the polytene chromosome maps for each species, presents the new polytene chromosome maps, and anchors the genomic scaffolds to the cytological maps using genetic and physical markers. The mapping data agree with Muller's idea that the majority of Drosophila genes are syntenic. Despite the conservation of genes within homologous chromosome arms across species, the karyotypes of these species have changed through the fusion of chromosomal arms followed by subsequent rearrangement events.  相似文献   

12.
The assembly and maturation of viruses with icosahedral capsids must be coordinated with icosahedral symmetry. The icosahedral symmetry imposes also the restrictions on the cooperative specific interactions between genomic RNA/DNA and coat proteins that should be reflected in quasi-regular segmentation of viral genomic sequences. Combining discrete direct and double Fourier transforms, we studied the quasi-regular large-scale segmentation in genomic sequences of different ssRNA, ssDNA, and dsDNA viruses. The particular representatives included satellite tobacco mosaic virus (STMV) and the strains of satellite tobacco necrosis virus (STNV), STNV-C, STNV-1, STNV-2, Escherichia phages MS2, ?X174, α3, and HK97, and Simian virus 40. In all their genomes, we found the significant quasi-regular segmentation of genomic sequences related to the virion assembly and the genome packaging within icosahedral capsid. We also found good correspondence between our results and available cryo-electron microscopy data on capsid structures and genome packaging in these viruses. Fourier analysis of genomic sequences provides the additional insight into mechanisms of hierarchical genome packaging and may be used for verification of the concepts of 3-fold or 5-fold intermediates in virion assembly. The results of sequence analysis should be taken into account at the choice of models and data interpretation. They also may be helpful for the development of antiviral drugs.  相似文献   

13.
The pooid subfamily of grasses includes some of the most important crop, forage and turf species, such as wheat, barley and Lolium. Developing genomic resources, such as whole-genome physical maps, for analysing the large and complex genomes of these crops and for facilitating biological research in grasses is an important goal in plant biology. We describe a bacterial artificial chromosome (BAC)-based physical map of the wild pooid grass Brachypodium distachyon and integrate this with whole genome shotgun sequence (WGS) assemblies using BAC end sequences (BES). The resulting physical map contains 26 contigs spanning the 272 Mb genome. BES from the physical map were also used to integrate a genetic map. This provides an independent validation and confirmation of the published WGS assembly. Mapped BACs were used in Fluorescence In Situ Hybridisation (FISH) experiments to align the integrated physical map and sequence assemblies to chromosomes with high resolution. The physical, genetic and cytogenetic maps, integrated with whole genome shotgun sequence assemblies, enhance the accuracy and durability of this important genome sequence and will directly facilitate gene isolation.  相似文献   

14.
The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.  相似文献   

15.
Thanks to a dramatic reduction in sequencing costs followed by a rapid development of bioinformatics tools, genome assembly and annotation have become accessible to many researchers in recent years. Among tetrapods, birds have genomes that display many features that facilitate their assembly and annotation, such as small genome size, low number of repeats and highly conserved genomic structure. However, we found that high genomic heterozygosity could have a great impact on the quality of the genome assembly of the thick‐billed murre (Uria lomvia), an arctic colonial seabird. In this study, we tested the performance of three genome assemblers, ray /sscape , soapdenovo 2 and platanus , in assembling the highly heterozygous genome of the thick‐billed murre. Our results show that platanus , an assembler specifically designed for heterozygous genomes, outperforms the other two approaches and produces a highly contiguous (N50 = 15.8 Mb) and complete genome assembly (93% presence of genes from the Benchmarking Universal Single Copy Ortholog [BUSCO] gene set). Additionally, we annotated the thick‐billed murre genome using a homology‐based approach that takes advantage of the genomic resources available for birds and other taxa. Our study will be useful for those researchers who are approaching assembly and annotation of highly heterozygous genomes, or genomes of species of conservation concern, and/or who have limited financial resources.  相似文献   

16.
Recently a number of computational approaches have been developed for the prediction of protein–protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.  相似文献   

17.
Following the discovery of Acanthamoeba polyphaga mimivirus, diverse giant viruses have been isolated. However, only a small fraction of these isolates have been completely sequenced, limiting our understanding of the genomic diversity of giant viruses. MinION is a portable and low-cost long-read sequencer that can be readily used in a laboratory. Although MinION provides highly error-prone reads that require correction through additional short-read sequencing, recent studies assembled high-quality microbial genomes only using MinION sequencing. Here, we evaluated the accuracy of MinION-only genome assemblies for giant viruses by re-sequencing a prototype marseillevirus. Assembled genomes presented over 99.98% identity to the reference genome with a few gaps, demonstrating a high accuracy of the MinION-only assembly. As a proof of concept, we de novo assembled five newly isolated viruses. Average nucleotide identities to their closest known relatives suggest that the isolates represent new species of marseillevirus, pithovirus and mimivirus. The assembly of subsampled reads demonstrated that their taxonomy and genomic composition could be analysed at the 50× sequencing coverage. We also identified a pithovirus gene whose homologues were detected only in metagenome-derived relatives. Collectively, we propose that MinION-only assembly is an effective approach to rapidly perform a genome-wide analysis of isolated giant viruses.  相似文献   

18.
HAPPY mapping is an in vitro approach for defining the order and spacing of DNA markers directly on native genomic DNA. This cloning-free technique is based on analysing the segregation of markers amplified from high molecular weight genomic DNA which has been broken randomly and 'segregated' by limiting dilution into subhaploid samples. It is a uniquely versatile tool, allowing for the construction of genome maps with flexible ranges and resolutions. Moreover, it is applicable to plant genomes, for which many of the techniques pioneered in animal genomes are inapplicable or inappropriate. We report here its demonstration in a plant genome by reconstructing the physical map of a 1.9 Mbp region around the FCA locus of Arabidopsis thaliana. The resulting map, spanning around 10% of chromosome 4, is in excellent agreement with the DNA sequence and has a mean marker spacing of 16 kbp. We argue that HAPPY maps of any required resolution can be made immediately and with relatively little effort for most plant species and, furthermore, that such maps can greatly aid the construction of regional or genome-wide physical maps.  相似文献   

19.
20.
Comparison of genomic maps is hampered by errors and ambiguities introduced by mapping technology, incorrectly resolved paralogy, small samples of markers, and extensive genome rearrangement. We design an analysis to remove or resolve most of these problems and to extract corrected data where markers occur in consecutive strips in both genomes. To do this, we introduce the notion of prestrip, an efficient way of generating these and a compatibility analysis culminating in a maximum weighted clique (MWC) search. The output can be directly analyzed with genome rearrangement algorithms, allowing the restoration of some of the data not incorporated into the clique solution. We investigate the trade-off between criteria for discarding excessive prestrips to make MWC feasible in terms of retaining as many markers as possible in the solution and producing an economical rearrangement analysis. We explore these questions through simulation and through comparison of the rice and sorghum genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号