首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The success in complete sequencing of "small" genomes and development of new technologies which sharply accelerate processes of cloning and sequencing made real an intensive development of plant genomics and complete sequencing of DNA of some species. It is assumed that the success in plant genomics will result in revolutionary changes in biotechnology and plant breeding. However, the enormous size of genomes (tens of billions bp), their extraordinary enrichment in repetitive sequences, and allopolyploidy (the presence in a nucleus of several related but not identical genomes) force us to think that only few "basic" will undergo complete sequencing, whereas the genome investigations in other species will follow principles of comparative genomics. By the present time, complete sequencing of the Arabidopsis genome (125 Mbp) is completed and that of the rice genome (about 430 Mbp) is close to its end. Studying other plant genomes, including those economically valuable, already began on the basis of these investigations. Peculiarities of plant genomes make extraordinarily important the knowledge on plant chromosomes which, in its turn, requires expansion of investigations in this direction and development of new chromosome technologies, including the DNA-sparing methods of high-resolution banding.  相似文献   

2.
《Genomics》2020,112(5):3150-3156
Fungal genomes display incredible levels of complexity and diversity, and are exceptional study systems for genome evolution. Here we used the Oxford Nanopore MinION sequencing platform to generate high-quality fungal genomes from complex metagenomic samples of lichen thalli. We sequenced two wolf lichens using one flow cell per sample, generating 17.1 Gbps for Letharia lupina and 14.3 Gbps for Letharia columbiana. The resulting L. lupina genome is one of the most contiguous lichen genomes available to date, with 49.2 Mbp contained on 31 contigs. The L. columbiana genome, while less contiguous, is still relatively high quality, with 52.3 Mbp on a total of 161 contigs. Each thallus for both species contained multiple distinct haplotypes, a phenomenon that has rarely been empirically demonstrated. The Oxford Nanopore sequencing technologies are robust and effective when applied to complex symbioses, and have the potential to fundamentally transform our understanding of fungal genetics.  相似文献   

3.
Zelenin  A. V.  Badaeva  E. D.  Muravenko  O. V. 《Molecular Biology》2001,35(3):285-293
The success in complete sequencing of small genomes and development of new technologies that markedly speed up the cloning and sequencing processes open the way to intense development of plant genomics and complete sequencing of DNA of some species. It is assumed that success in plant genomics will result in revolutionary changes in biotechnology and plant breeding. However, the enormous size of genomes (tens of billions of base pairs), their extraordinary abundance of repetitive sequences, and allopolyploidy (the presence in a nucleus of several related but not identical genomes) force us to think that only few basic plant species will undergo complete sequencing, whereas genome investigations in other species will follow the principles of comparative genomics. By the present time, sequencing of the Arabidopsis genome (125 Mbp) is completed and that of the rice genome (about 430 Mbp) is close to its end. Studying the genomes of other plants, including economically valuable ones, already began on the basis of these works. The peculiarities of plant genomes make extraordinarily important our detailed knowledge on plant chromosomes which, in its turn, calls for expansion of research in this direction and development of new chromosome technologies, including the DNA-sparing methods of high-resolution banding.  相似文献   

4.
Researchers have assembled thousands of eukaryotic genomes using Illumina reads, but traditional mate‐pair libraries cannot span all repetitive elements, resulting in highly fragmented assemblies. However, both chromosome conformation capture techniques, such as Hi‐C and Dovetail Genomics Chicago libraries and long‐read sequencing, such as Pacific Biosciences and Oxford Nanopore, help span and resolve repetitive regions and therefore improve genome assemblies. One important livestock species of arid regions that does not have a high‐quality contiguous reference genome is the dromedary (Camelus dromedarius). Draft genomes exist but are highly fragmented, and a high‐quality reference genome is needed to understand adaptation to desert environments and artificial selection during domestication. Dromedaries are among the last livestock species to have been domesticated, and together with wild and domestic Bactrian camels, they are the only representatives of the Camelini tribe, which highlights their evolutionary significance. Here we describe our efforts to improve the North African dromedary genome. We used Chicago and Hi‐C sequencing libraries from Dovetail Genomics to resolve the order of previously assembled contigs, producing almost chromosome‐level scaffolds. Remaining gaps were filled with Pacific Biosciences long reads, and then scaffolds were comparatively mapped to chromosomes. Long reads added 99.32 Mbp to the total length of the new assembly. Dovetail Chicago and Hi‐C libraries increased the longest scaffold over 12‐fold, from 9.71 Mbp to 124.99 Mbp and the scaffold N50 over 50‐fold, from 1.48 Mbp to 75.02 Mbp. We demonstrate that Illumina de novo assemblies can be substantially upgraded by combining chromosome conformation capture and long‐read sequencing.  相似文献   

5.
6.

Background

Pseudomonas aeruginosa is an important opportunistic pathogen responsible for many infections in hospitalized and immunocompromised patients. Previous reports estimated that approximately 10% of its 6.6 Mbp genome varies from strain to strain and is therefore referred to as “accessory genome”. Elements within the accessory genome of P. aeruginosa have been associated with differences in virulence and antibiotic resistance. As whole genome sequencing of bacterial strains becomes more widespread and cost-effective, methods to quickly and reliably identify accessory genomic elements in newly sequenced P. aeruginosa genomes will be needed.

Results

We developed a bioinformatic method for identifying the accessory genome of P. aeruginosa. First, the core genome was determined based on sequence conserved among the completed genomes of twelve reference strains using Spine, a software program developed for this purpose. The core genome was 5.84 Mbp in size and contained 5,316 coding sequences. We then developed an in silico genome subtraction program named AGEnt to filter out core genomic sequences from P. aeruginosa whole genomes to identify accessory genomic sequences of these reference strains. This analysis determined that the accessory genome of P. aeruginosa ranged from 6.9-18.0% of the total genome, was enriched for genes associated with mobile elements, and was comprised of a majority of genes with unknown or unclear function. Using these genomes, we showed that AGEnt performed well compared to other publically available programs designed to detect accessory genomic elements. We then demonstrated the utility of the AGEnt program by applying it to the draft genomes of two previously unsequenced P. aeruginosa strains, PA99 and PA103.

Conclusions

The P. aeruginosa genome is rich in accessory genetic material. The AGEnt program accurately identified the accessory genomes of newly sequenced P. aeruginosa strains, even when draft genomes were used. As P. aeruginosa genomes become available at an increasingly rapid pace, this program will be useful in cataloging the expanding accessory genome of this bacterium and in discerning correlations between phenotype and accessory genome makeup. The combination of Spine and AGEnt should be useful in defining the accessory genomes of other bacterial species as well.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-737) contains supplementary material, which is available to authorized users.  相似文献   

7.
Along with methane, methanol and methylated amines represent important biogenic atmospheric constituents; thus, not only methanotrophs but also nonmethanotrophic methylotrophs play a significant role in global carbon cycling. The complete genome of a model obligate methanol and methylamine utilizer, Methylobacillus flagellatus (strain KT) was sequenced. The genome is represented by a single circular chromosome of approximately 3 Mbp, potentially encoding a total of 2,766 proteins. Based on genome analysis as well as the results from previous genetic and mutational analyses, methylotrophy is enabled by methanol and methylamine dehydrogenases and their specific electron transport chain components, the tetrahydromethanopterin-linked formaldehyde oxidation pathway and the assimilatory and dissimilatory ribulose monophosphate cycles, and by a formate dehydrogenase. Some of the methylotrophy genes are present in more than one (identical or nonidentical) copy. The obligate dependence on single-carbon compounds appears to be due to the incomplete tricarboxylic acid cycle, as no genes potentially encoding alpha-ketoglutarate, malate, or succinate dehydrogenases are identifiable. The genome of M. flagellatus was compared in terms of methylotrophy functions to the previously sequenced genomes of three methylotrophs, Methylobacterium extorquens (an alphaproteobacterium, 7 Mbp), Methylibium petroleiphilum (a betaproteobacterium, 4 Mbp), and Methylococcus capsulatus (a gammaproteobacterium, 3.3 Mbp). Strikingly, metabolically and/or phylogenetically, the methylotrophy functions in M. flagellatus were more similar to those in M. capsulatus and M. extorquens than to the ones in the more closely related M. petroleiphilum species, providing the first genomic evidence for the polyphyletic origin of methylotrophy in Betaproteobacteria.  相似文献   

8.
Although the nuclear genome of banana (Musa spp.) is relatively small (1C approximately 610 Mbp for M. acuminata), the results obtained from other sequenced genomes suggest that more than half of the banana genome may be composed of repetitive and non-coding DNA sequences. Knowledge of repetitive DNA can facilitate mapping of important traits, phylogenetic studies, BAC-based physical mapping, and genome sequencing/annotation. However, only a few repetitive DNA sequences have been characterized in banana. In this work, we used DNA reassociation kinetics to isolate the highly repeated fraction of the banana genome (M. acuminata 'Calcutta 4'). Two libraries, one prepared from Cot 相似文献   

9.
Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.  相似文献   

10.
The genomes of flowering plants vary in size from about 0.1 to over 100 gigabase pairs (Gbp), mostly because of polyploidy and variation in the abundance of repetitive elements in intergenic regions. High-quality sequences of the relatively small genomes of Arabidopsis (0.14 Gbp) and rice (0.4 Gbp) have now been largely completed. The sequencing of plant genomes that have a more representative size (the mean for flowering plant genomes is 5.6 Gbp) has been seen as a daunting task, partly because of their size and partly because of the numerous highly conserved repeats. Nevertheless, creative strategies and powerful new tools have been generated recently in the plant genetics community, so that sequencing large plant genomes is now a realistic possibility. Maize (2.4-2.7 Gbp) will be the first gigabase-size plant genome to be sequenced using these novel approaches. Pilot studies on maize indicate that the new gene-enrichment, gene-finishing and gene-orientation technologies are efficient, robust and comprehensive. These strategies will succeed in sequencing the gene-space of large genome plants, and in locating all of these genes and adjacent sequences on the genetic and physical maps.  相似文献   

11.
Nuclear holoploid genome sizes (C-values) have been estimated to vary about 800-fold in angiosperms, with the smallest established 1C-value of 157 Mbp recorded in Arabidopsis thaliana. In the highly specialized carnivorous family Lentibulariaceae now three taxa have been found that exhibit significantly lower values: Genlisea margaretae with 63 Mbp, G. aurea with 64 Mbp, and Utricularia gibba with 88 Mbp. The smallest mitotic anaphase chromatids in G. aurea have 2.1 Mbp and are thus of bacterial size (NB: E. coli has ca. 4 Mbp). Several Utricularia species range somewhat lower than A. thaliana or are similar in genome size. The highest 1C-value known from species of Lentibulariaceae was found in Genlisea hispidula with 1510 Mbp, and results in about 24-fold variation for Genlisea and the Lentibulariaceae. Taking into account these new measurements, genome size variation in angiosperms is now almost 2000-fold. Genlisea and Utricularia are plants with terminal positions in the phylogeny of the eudicots, so that the findings are relevant for the understanding of genome miniaturization. Moreover, the Genlisea-Utricularia clade exhibits one of the highest mutational rates in several genomic regions in angiosperms, what may be linked to specialized patterns of genome evolution. Ultrasmall genomes have not been found in Pinguicula, which is the sister group of the Genlisea-Utricularia clade, and which does not show accelerated mutational rates. C-values in Pinguicula varied only 1.7-fold from 487 to 829 Mbp.  相似文献   

12.

Background  

The decrease in cost for sequencing and improvement in technologies has made it easier and more common for the re-sequencing of large genomes as well as parallel sequencing of small genomes. It is possible to completely sequence a small genome within days and this increases the number of publicly available genomes. Among the types of genomes being rapidly sequenced are those of microbial and viral genomes responsible for infectious diseases. However, accurate gene prediction is a challenge that persists for decoding a newly sequenced genome. Therefore, accurate and efficient gene prediction programs are highly desired for rapid and cost effective surveillance of RNA viruses through full genome sequencing.  相似文献   

13.
Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle‐scale barcodes. Next‐generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high‐quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long‐range PCR and sequenced using next‐generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early‐diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome‐scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms.  相似文献   

14.
Optimal integration of next-generation sequencing into mainstream research requires re-evaluation of how problems can be reasonably overcome and what questions can be asked. One potential application is the rapid acquisition of genomic information to identify microsatellite loci for evolutionary, population genetic and chromosome linkage mapping research on non-model and not previously sequenced organisms. Here, we report on results using high-throughput sequencing to obtain a large number of microsatellite loci from the venomous snake Agkistrodon contortrix, the copperhead. We used the 454 Genome Sequencer FLX next-generation sequencing platform to sample randomly ∼27 Mbp (128 773 reads) of the copperhead genome, thus sampling about 2% of the genome of this species. We identified microsatellite loci in 11.3% of all reads obtained, with 14 612 microsatellite loci identified in total, 4564 of which had flanking sequences suitable for polymerase chain reaction primer design. The random sequencing-based approach to identify microsatellites was rapid, cost-effective and identified thousands of useful microsatellite loci in a previously unstudied species.  相似文献   

15.
Aphids are sap-feeding insects that host a range of bacterial endosymbionts including the obligate, nutritional mutualist Buchnera plus several bacteria that are not required for host survival. Among the latter, 'Candidatus Regiella insecticola' and 'Candidatus Hamiltonella defensa' are found in pea aphids and other hosts and have been shown to protect aphids from natural enemies. We have sequenced almost the entire genome of R. insecticola (2.07 Mbp) and compared it with the recently published genome of H.?defensa (2.11 Mbp). Despite being sister species the two genomes are highly rearranged and the genomes only have ~55% of genes in common. The functions encoded by the shared genes imply that the bacteria have similar metabolic capabilities, including only two essential amino acid biosynthetic pathways and active uptake mechanisms for the remaining eight, and similar capacities for host cell toxicity and invasion (type 3 secretion systems and RTX toxins). These observations, combined with high sequence divergence of orthologues, strongly suggest an ancient divergence after establishment of a symbiotic lifestyle. The divergence in gene sets and in genome architecture implies a history of rampant recombination and gene inactivation and the ongoing integration of mobile DNA (insertion sequence elements, prophage and plasmids).  相似文献   

16.
O'Brien HE  Gong Y  Fung P  Wang PW  Guttman DS 《PloS one》2011,6(11):e27199
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.  相似文献   

17.
Actinobacteria such as streptomycetes are renowned for their ability to produce bioactive natural products including nonribosomal peptides (NRPs) and polyketides (PKs). The advent of genome sequencing has revealed an even larger genetic repertoire for secondary metabolism with most of the small molecule products of these gene clusters still unknown. Here, we employed a "protein-first" method called PrISM (Proteomic Investigation of Secondary Metabolism) to screen 26 unsequenced actinomycetes using mass spectrometry-based proteomics for the targeted detection of expressed nonribosomal peptide synthetases or polyketide synthases. Improvements to the original PrISM screening approach (Nat. Biotechnol. 2009, 27, 951-956), for example, improved de novo peptide sequencing, have enabled the discovery of 10 NRPS/PKS gene clusters from 6 strains. Taking advantage of the concurrence of biosynthetic enzymes and the secondary metabolites they generate, two natural products were associated with their previously "orphan" gene clusters. This work has demonstrated the feasibility of a proteomics-based strategy for use in screening for NRP/PK production in actinomycetes (often >8 Mbp, high GC genomes) versus the bacilli (2-4 Mbp genomes) used previously.  相似文献   

18.
Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4× coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element–like and long interspersed element–like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.  相似文献   

19.
The Staphylococcus carnosus genome has the highest GC content of all sequenced staphylococcal genomes, with 34.6%, and therefore represents a species that is set apart from S. aureus, S. epidermidis, S. saprophyticus, and S. haemolyticus. With only 2.56 Mbp, the genome belongs to a family of smaller staphylococcal genomes, and the ori and ter regions are asymmetrically arranged with the replichores I (1.05 Mbp) and II (1.5 Mbp). The events leading up to this asymmetry probably occurred not that long ago in evolution, as there was not enough time to approach the natural tendency of a physical balance. Unlike the genomes of pathogenic species, the TM300 genome does not contain mobile elements such as plasmids, insertion sequences, transposons, or STAR elements; also, the number of repeat sequences is markedly decreased, suggesting a comparatively high stability of the genome. While most S. aureus genomes contain several prophages and genomic islands, the TM300 genome contains only one prophage, ΦTM300, and one genomic island, νSCA1, which is characterized by a mosaic structure mainly composed of species-specific genes. Most of the metabolic core pathways are present in the genome. Some open reading frames are truncated, which reflects the nutrient-rich environment of the meat starter culture, making some functions dispensable. The genome is well equipped with all functions necessary for the starter culture, such as nitrate/nitrite reduction, various sugar degradation pathways, two catalases, and nine osmoprotection systems. The genome lacks most of the toxins typical of S. aureus as well as genes involved in biofilm formation, underscoring the nonpathogenic status.  相似文献   

20.
The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号