首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
The gene neighborhood in prokaryotic genomes has been effectively utilized in inferring co-functional networks in various organisms. Previously, such genomic context information has been sought among completely assembled prokaryotic genomes. Here, we present a method to infer functional gene networks according to the gene neighborhood in metagenome contigs, which are incompletely assembled genomic fragments. Given that the amount of metagenome sequence data has now surpassed that of completely assembled prokaryotic genomes in the public domain, we expect benefits of inferring networks by the metagenome-based gene neighborhood. We generated co-functional networks for diverse taxonomical species using metagenomics contigs derived from the human microbiome and the ocean microbiome. We found that the networks based on the metagenome gene neighborhood outperformed those based on 1748 completely assembled prokaryotic genomes. We also demonstrated that the metagenome-based gene neighborhood could predict genes related to virulence-associated phenotypes in a bacterial pathogen, indicating that metagenome-based functional links could be sufficiently predictive for some phenotypes of medical importance. Owing to the exponential growth of metagenome sequence data in public repositories, metagenome-based inference of co-functional networks will facilitate understanding of gene functions and pathways in diverse species.  相似文献   

2.
It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: “What have we learned from this vast amount of new genomic data?” Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity—even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.  相似文献   

3.
The intestinal spirochete Brachyspira hyodysenteriae is an important pathogen in swine, causing mucohemorrhagic colitis in a disease known as swine dysentery. Based on the detection of significant linkage disequilibrium in multilocus sequence data, the species is considered to be clonal. An analysis of the genome sequence of Western Australian B. hyodysenteriae strain WA1 has been published, and in the current study 19 further strains from countries around the world were sequenced with Illumina technology. The genomes were assembled and aligned to over 97.5% of the reference WA1 genome at a percentage sequence identity better than 80%. Strain regions not aligned to the reference ranged between 0.2 and 2.5%. Clustering of the strain genes found on average 2,354 (88%) core genes, 255 (8.6%) ancillary genes and 77 (2.9%) unique genes per strain. Depending on the strain the proportion of genes with 100% sequence identity to WA1 ranged from 85% to 20%. The result is a global comparative genomic analysis of B. hyodysenteriae genomes revealing potential differential phenotypic markers for numerous strains. Despite the differences found, the genomes were less varied than those of the related pathogenic species Brachyspira pilosicoli, and the analysis supports the clonal nature of the species. From this study, a public genome resource has been created that will serve as a repository for further genetic and phenotypic studies of these important porcine bacteria. This is the first intra-species B. hyodysenteriae comparative genomic analysis.  相似文献   

4.
The large and complex genome of wheat makes genetic and genomic analysis in this important species both expensive and resource intensive. The application of next-generation sequencing technologies is particularly resource intensive, with at least 17?Gbp of sequence data required to obtain minimal (1×) coverage of the genome. A similar volume of data would represent almost 40× coverage of the rice genome. Progress can be made through the establishment of consortia to produce shared genomic resources. Australian wheat genome researchers, working with Bioplatforms Australia, have collaborated in a national initiative to establish a genetic diversity dataset representing Australian wheat germplasm based on whole genome next-generation sequencing data. Here, we describe the establishment and validation of this resource which can provide a model for broader international initiatives for the analysis of large and complex genomes.  相似文献   

5.
Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app.  相似文献   

6.
The microbial pan-genome   总被引:1,自引:0,他引:1  
A decade after the beginning of the genomic era, the question of how genomics can describe a bacterial species has not been fully addressed. Experimental data have shown that in some species new genes are discovered even after sequencing the genomes of several strains. Mathematical modeling predicts that new genes will be discovered even after sequencing hundreds of genomes per species. Therefore, a bacterial species can be described by its pan-genome, which is composed of a "core genome" containing genes present in all strains, and a "dispensable genome" containing genes present in two or more strains and genes unique to single strains. Given that the number of unique genes is vast, the pan-genome of a bacterial species might be orders of magnitude larger than any single genome.  相似文献   

7.
Complex microbial communities typically contain a large number of low abundance species, which collectively, comprise a considerable proportion of the community. This ‘rare biosphere’ has been speculated to contain keystone species and act as a repository of genomic diversity to facilitate community adaptation. Many environmental microbes are currently resistant to cultivation, and can only be accessed via culture‐independent approaches. To enhance our understanding of the role of the rare biosphere, we aimed to improve their metagenomic representation using DNA normalization methods, and assess normalization success via shotgun DNA sequencing. A synthetic metagenome was constructed from the genomic DNA of five bacterial species, pooled in a defined ratio spanning three orders of magnitude. The synthetic metagenome was fractionated and thermally renatured, allowing the most abundant sequences to hybridize. Double‐stranded DNA was removed either by hydroxyapatite chromatography, or by a duplex‐specific nuclease (DSN). The chromatographic method failed to enrich for the genomes present in low starting abundance, whereas the DSN method resulted in all genomes reaching near equimolar abundance. The representation of the rarest member was increased by approximately 450‐fold. De novo assembly of the normalized metagenome enabled up to 18.0% of genes from the rarest organism to be assembled, in contrast to the un‐normalized sample, where genes were not able to be assembled at the same sequencing depth. This study has demonstrated that the application of normalization methods to metagenomic samples is a powerful tool to enrich for sequences from rare taxa, which will shed further light on their ecological niches.  相似文献   

8.
9.
Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.  相似文献   

10.
Chudin  Eugene  Walker  Randal  Kosaka  Alan  Wu  Sue X  Rabert  Douglas  Chang  Thomas K  Kreder  Dirk E 《Genome biology》2002,4(1):1-10

Background

The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods.

Results

We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes.

Conclusion

The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics.  相似文献   

11.
12.
Recent years saw a dramatic increase in genomic and proteomic data in public archives. Now with the complete genome sequences of human and other species in hand, detailed analyses of the genome sequences will undoubtedly improve our understanding of biological systems and at the same time require sophisticated bioinformatic tools. Here we review what computational challenges are ahead and what are the new exciting developments in this exciting field.  相似文献   

13.
14.
Higher systematics within the Digenea, Carus 1863 have been relatively stable since a phylogenetic analysis of partial nuclear ribosomal markers (rDNA) led to the erection of the Diplostomida Olson, Cribb, Tkach, Bray, and Littlewood, 2003. However, recent mitochondrial (mt) genome phylogenies suggest this order might be paraphyletic. These analyses show members of two diplostomidan superfamilies are more closely related to the Plagiorchiida La Rue, 1957 than to other members of the Diplostomida. A recent phylogeny based on partial cytochrome c oxidase I also indicates one of the groups implicated, the Diplostomoidea Poirier, 1886, is non-monophyletic. To determine if these results were robust to additional taxon sampling, we analyzed mt genomes from seven diplostomoids in three families. To choose between phylogenetic alternatives based on mt genomes and the prior rDNA-based topology, we analyzed hundreds of ultra-conserved genomic elements assembled from shotgun sequencing. The Diplostomida was paraphyletic in the mt genome phylogeny but supported in the ultra-conserved genomic element phylogeny. We speculate this mitonuclear discordance is related to ancient, rapid radiation in the Digenea. Both ultra-conserved genomic elements and mt genomes support the monophyly of the Diplostomoidea and show congruent relationships within it. The Cyathocotylidae Mühling, 1898 are early diverging descendants of a paraphyletic clade of Diplostomidae Poirier, 1886, in which are nested members of the Strigeidae Railliet, 1919; the results support prior suggestions that the Crassiphialinae Sudarikov, 1960 will rise to the family level. Morphological traits of diplostomoid metacercariae appear to be more useful for differentiating clades than those of adults. We describe a new species of Cotylurus Szidat, 1928, resurrect a species of Hysteromorpha Lutz, 1931, and find support for a species of Alaria Schrank, 1788 of contested validity. Complete rDNA operons from seven diplostomoid species are provided as a resource for future studies.  相似文献   

15.
Ricker N  Qian H  Fulthorpe RR 《Genomics》2012,100(3):167-175
The de novo assembly of next generation sequencing data is a daunting task made more difficult by the presence of genomic repeats or transposable elements, resulting in an increasing number of genomes designated as completed draft assemblies. We created and assembled idealized sequence data sets for Cupriavidus metallidurans CH34, Caulobacter sp. K31, Gramella forsetii KT0803, Rhodobacter sphaeroides 2.4.1 and Bordetella bronchiseptica RB50. In addition to confirming the role of transposable elements in interrupting the assemblies, an association was found between the most fragmented regions and known or predicted genomic islands in these strains. Assembly quality was more strongly related to putative genomic island content than to any other factor examined. We believe this association indicates that draft assemblies are limiting our ability to understand the genomic context of important bacterial adaptations and that the increased effort required for finishing genomes can provide a wealth of information for future studies.  相似文献   

16.
17.
Many economically important crops have large and complex genomes that hamper their sequencing by standard methods such as whole genome shotgun (WGS). Large tracts of methylated repeats occur in plant genomes that are interspersed by hypomethylated gene‐rich regions. Gene‐enrichment strategies based on methylation profiles offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration with McrBC endonuclease digestion to enrich for euchromatic regions in the sugarcane genome. To verify the efficiency of methylation filtration and the assembly quality of sequences submitted to gene‐enrichment strategy, we have compared assemblies using methyl‐filtered (MF) and unfiltered (UF) libraries. The use of methy filtration allowed a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5× more scaffolds and 1.7× more assembled Mb in length compared with unfiltered dataset. The coverage of sorghum coding sequences (CDS) by MF scaffolds was at least 36% higher than by the use of UF scaffolds. Using MF technology, we increased by 134× the coverage of gene regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds that covered all genes of the sugarcane bacterial artificial chromosomes (BACs), 97.2% of sugarcane expressed sequence tags (ESTs), 92.7% of sugarcane RNA‐seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds from encoded enzymes of the sucrose/starch pathway discovered 291 single‐nucleotide polymorphisms (SNPs) in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes was also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and for improvement of sugarcane as a biofuel crop.  相似文献   

18.
19.
Qu Zhang  Niclas Backström 《Chromosoma》2014,123(1-2):165-168
The complexity of eukaryote genomes makes assembly errors inevitable in the process of constructing reference genomes. Next-generation sequencing (NGS) could provide an efficient way to validate previously assembled genomes. Here, we exploited NGS data to interrogate the chicken reference genome and identified 35 pairs of nearly identical regions with >99.5 % sequence similarity and a median size of 109 kb. Several lines of evidence, including read depth, the composition of junction sequences, and sequence similarity, suggest that these regions present genome assembly errors and should be excluded from forthcoming genomic studies.  相似文献   

20.
Human gut microbiota modulates normal physiological functions, such as maintenance of barrier homeostasis and modulation of metabolism, as well as various chronic diseases including type 2 diabetes and gastrointestinal cancer. Despite decades of research, the composition of the gut microbiota remains poorly understood. Here, we established an effective extraction method to obtain high quality gut microbiota genomes, and analyzed them with third-generation sequencing technology. We acquired a large quantity of data from each sample and assembled large numbers of reliable contigs. With this approach, we constructed tens of completed bacterial genomes in which there were several new bacteria species. We also identified a new conditional pathogen, Enterococcus tongjius, which is a member of Enterococci. This work provided a novel and reliable approach to recover gut microbiota genomes, facilitating the discovery of new bacteria species and furthering our understanding of the microbiome that underlies human health and diseases.Subject terms: DNA sequencing, Mechanisms of disease  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号