首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SARS-CoV-2 belongs to the coronavirus family. Comparing genomic features of viral genomes of coronavirus family can improve our understanding about SARS-CoV-2. Here we present the first pan-genome analysis of 3,932 whole genomes of 101 species out of 4 genera from the coronavirus family. We found that a total of 181 genes in the pan-genome of coronavirus family, among which only 3 genes, the S gene, M gene and N gene, are highly conserved. We also constructed a pan-genome from 23,539 whole genomes of SARS-CoV-2. There are 13 genes in total in the SARS-CoV-2 pan-genome. All of the 13 genes are core genes for SARS-CoV-2. The pan-genome of coronaviruses shows a lower level of diversity than the pan-genomes of other RNA viruses, which contain no core gene. The three highly conserved genes in coronavirus family, which are also core genes in SARS-CoV-2 pan-genome, could be potential targets in developing nucleic acid diagnostic reagents with a decreased possibility of cross-reaction with other coronavirus species.  相似文献   

2.
30株大肠杆菌的泛基因组学特征分析   总被引:2,自引:0,他引:2  
Fu J  Qin QW 《遗传》2012,34(6):765-772
泛基因组(Pan-genome)是某一物种全部基因的总称,其中包括核心基因组(该物种所有个体中都存在的基因)和非必须基因组(只在部分个体中存在的基因,以及某个体特有的基因)。文章从泛基因组学角度比较分析了30株已经完成测序的大肠杆菌的基因、基因组成及其进化特征,结果表明核心基因只占据每株大肠杆菌全部基因数目的 50%左右,而平均每个菌株有146个特有基因,结果表明随着更多大肠杆菌菌株的基因组被测序,将会不断有新基因被发现。通过比较分析大肠杆菌不同菌株之间基因的保守性与基因的GC含量以及选择压力之间的关系,发现越保守的基因其GC含量变化范围越窄,同时在进化中受到的选择压力也越大。这些结果将有助于深入了解大肠杆菌基因组的进化特征及其基因组成的动态变化,并为预防和控制由致病性大肠杆菌引发的流行疾病提供理论依据,同时也为大规模病原菌基因组数据的分析方法提供借鉴。  相似文献   

3.
《Genomics》2020,112(5):3003-3012
Ochrobactrum genus is comprised of soil-dwelling Gram-negative bacteria mainly reported for bioremediation of toxic compounds. Since last few years, mainly two species of this genus, O. intermedium and O. anthropi were documented for causing infections mostly in the immunocompromised patients. Despite such ubiquitous presence, study of adaptation in various niches is still lacking. Thus, to gain insights into the niche adaptation strategies, pan-genome analysis was carried out by comparing 67 genome sequences belonging to Ochrobactrum species. Pan-genome analysis revealed it is an open pan-genome indicative of the continuously evolving nature of the genus. The presence/absence of gene clusters also illustrated the unique presence of antibiotic efflux transporter genes and type IV secretion system genes in the clinical strains while the genes of solvent resistance and exporter pumps in the environmental strains. A phylogenomic investigation based on 75 core genes depicted better and robust phylogenetic resolution and topology than the 16S rRNA gene. To support the pan-genome analysis, individual genomes were also investigated for the mobile genetic elements (MGE), antibiotic resistance genes (ARG), metal resistance genes (MRG) and virulence factors (VF). The analysis revealed the presence of MGE, ARG, and MRG in all the strains which play an important role in the species evolution which is in agreement with the pan-genome analysis. The average nucleotide identity (ANI) based on the genetic relatedness between the Ochrobactrum species indicated a distinction between individual species. Interestingly, the ANI tool was able to classify the Ochrobactrum genomes to the species level which were assigned till the genus level on the NCBI database.  相似文献   

4.
The bacterial genus Dietzia is widely distributed in various environments. The genomes of 26 diverse strains of Dietzia, including almost all the type strains, were analysed in this study. This analysis revealed a lipid metabolism gene richness, which could explain the ability of Dietzia to live in oil related environments. The pan-genome consists of 83,976 genes assigned into 10,327 gene families, 792 of which are shared by all the genomes of Dietzia. Mathematical extrapolation of the data suggests that the Dietzia pan-genome is open. Both gene duplication and gene loss contributed to the open pan-genome, while horizontal gene transfer was limited. Dietzia strains primarily gained their diverse metabolic capacity through more ancient gene duplications. Phylogenetic analysis of Dietzia isolated from aquatic and terrestrial environments showed two distinct clades from the same ancestor. The genome sizes of Dietzia strains from aquatic environments were significantly larger than those from terrestrial environments, which was mainly due to the occurrence of more gene loss events during the evolutionary progress of the strains from terrestrial environments. The evolutionary history of Dietzia was tightly coupled to environmental conditions, and iron concentrations should be one of the key factors shaping the genomes of the Dietzia lineages.  相似文献   

5.
6.
Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics trees, and to identify the pan- and core genomes of this set of sequenced strains. A hierarchical clustering of variable genes allowed clear separation of the strains into clusters, including known pathotypes; clinically relevant serotypes can also be resolved in this way. In contrast, when in silico MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or ‘accessory’ genes thus make up more than 90% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group of Enterobacteriaceae.  相似文献   

7.
Thermococcales has a strong adaptability to extreme environments, which is of profound interest in explaining how complex life forms emerge on earth. However, their gene composition, thermal stability and evolution in hyperthermal environments are still little known. Here, we characterized the pan-genome architecture of 30 Thermococcales species to gain insight into their genetic properties, evolutionary patterns and specific metabolisms adapted to niches. We revealed an open pan-genome of Thermococcales comprising 6070 gene families that tend to increase with the availability of additional genomes. The genome contents of Thermococcales were flexible, with a series of genes experienced gene duplication, progressive divergence, or gene gain and loss events exhibiting distinct functional features. These archaea had concise types of heat shock proteins, such as HSP20, HSP60 and prefoldin, which were constrained by strong purifying selection that governed their conservative evolution. Furthermore, purifying selection forced genes involved in enzyme, motility, secretion system, defence system and chaperones to differ in functional constraints and their disparity in the rate of evolution may be related to adaptation to specific niche. These results deepened our understanding of genetic diversity and adaptation patterns of Thermococcales, and provided valuable research models for studying the metabolic traits of early life forms.  相似文献   

8.
Riemerella anatipestifer (RA) is a gram-negative bacterium that has a high potential to infect waterfowl. Although more and more genomes of RA have been generated comparaed to genomic analysis of RA still remains at the level of individual species. In this study, we analysed the pan-genome of 27 RA virulent isolates to reveal the intraspecies genomic diversity from various aspects. The multi-locus sequence typing (MLST) analysis suggests that the geographic origin of R. anatipestifer is Guangdong province, China. Results of pan-genome analysis revealed an open pan-genome for all 27 species with the sizes of 2967 genes. We identified 387 genes among 555 unique genes originated by horizontal gene transfer. Further studies showed 204 strain-specific HGT genes were predicted as virulent proteins. Screening the 1113 core genes in RA through subtractive genomic approach, 70 putative vaccine targets out of 125 non-cytoplasmic proteins have been predicted. Further analysis of these non A. platyrhynchos homologous proteins predicted that 56 essential proteins as drug target with more interaction partners were involved in unique metabolic pathways of RA. In conclusion, the present study indicated the essence and the diversity of RA and also provides useful information for identification of vaccine and drugs candidates in future.  相似文献   

9.

Background

Recently, Marcus et al. (Bioinformatics 30:3476–83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an \(O(n\log g)\) time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497–504, 2016) improved their result.

Results

In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele—a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.
  相似文献   

10.
11.

Background

Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.

Results

We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.

Conclusions

PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.
  相似文献   

12.
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.

A greatly improved reference genome sequence of barley was assembled from accurate long reads.  相似文献   

13.
The microbial pan-genome   总被引:1,自引:0,他引:1  
A decade after the beginning of the genomic era, the question of how genomics can describe a bacterial species has not been fully addressed. Experimental data have shown that in some species new genes are discovered even after sequencing the genomes of several strains. Mathematical modeling predicts that new genes will be discovered even after sequencing hundreds of genomes per species. Therefore, a bacterial species can be described by its pan-genome, which is composed of a "core genome" containing genes present in all strains, and a "dispensable genome" containing genes present in two or more strains and genes unique to single strains. Given that the number of unique genes is vast, the pan-genome of a bacterial species might be orders of magnitude larger than any single genome.  相似文献   

14.
Campylobacter jejuni strain M1 (laboratory designation 99/308) is a rarely documented case of direct transmission of C. jejuni from chicken to a person, resulting in enteritis. We have sequenced the genome of C. jejuni strain M1, and compared this to 12 other C. jejuni sequenced genomes currently publicly available. Compared to these, M1 is closest to strain 81116. Based on the 13 genome sequences, we have identified the C. jejuni pan-genome, as well as the core genome, the auxiliary genes, and genes unique between strains M1 and 81116. The pan-genome contains 2,427 gene families, whilst the core genome comprised 1,295 gene families, or about two-thirds of the gene content of the average of the sequenced C. jejuni genomes. Various comparison and visualization tools were applied to the 13 C. jejuni genome sequences, including a species pan- and core genome plot, a BLAST Matrix and a BLAST Atlas. Trees based on 16S rRNA sequences and on the total gene families in each genome are presented. The findings are discussed in the background of the proven virulence potential of M1.  相似文献   

15.
16.
17.
祝光涛  黄三文 《植物学报》2020,55(4):403-406
大豆(Glycine max)是重要的油料和蛋白作物, 其丰富的遗传变异为生物学性状挖掘和育种改良提供了重要的资源基础。然而, 单个基因组信息无法全面揭示种质资源的遗传变异, 泛基因组研究为解决这一不足提供了新方案。近日, 中国科学院遗传与发育生物学研究所田志喜和梁承志研究团队从2 898份大豆种质中选取26份代表性材料, 并整合已有的3个基因组, 构建了包含野生和栽培大豆的泛基因组和图基因组(graph-based genome), 鉴定了整个群体的绝大多数结构变异数据集, 确定了大豆种质的核心、非必需和个体特异的基因集。利用这些数据系统地揭示了生育期位点E3的等位基因变异和基因融合事件、种皮颜色基因I的单体型和演化关系以及结构变异对铁离子转运基因表达和地区适应性选择的影响。该研究为作物基因组学研究提供了一个新的模式, 同时将加速推动大豆遗传变异的鉴定、性状解析和种质创新。  相似文献   

18.
ABSTRACT: BACKGROUND: Rates of recombination vary by three orders of magnitude in bacteria but the reasons for this variation is unclear. We performed a genome-wide study of recombination rate variation among genes in the intracellular bacterium Bartonella henselae, which has among the lowest estimated ratio of recombination relative to mutation in prokaryotes. RESULTS: The 1.9 Mb genomes of B. henselae strains IC11, UGA10 and Houston-1 genomes showed only minor gene content variation. Nucleotide sequence divergence levels were less than 1% and the relative rate of recombination to mutation was estimated to 1.1 for the genome overall. Four to eight segments per genome presented significantly enhanced divergences, the most pronounced of which were the virB and trw gene clusters for type IV secretion systems that play essential roles in the infection process. Consistently, multiple recombination events were identified inside these gene clusters. High recombination frequencies were also observed for a gene putatively involved in iron metabolism. A phylogenetic study of this gene in 80 strains of Bartonella quintana, B. henselae and B. grahamii indicated different population structures for each species and revealed horizontal gene transfers across Bartonella species with different host preferences. CONCLUSIONS: Our analysis has shown little novel gene acquisition in B. henselae, indicative of a closed pan-genome, but higher recombination frequencies within the population than previously estimated. We propose that the dramatically increased fixation rate for recombination events at gene clusters for type IV secretion systems is driven by selection for sequence variability.  相似文献   

19.
Homoeologous regions of Brassica genomes were analyzed at the sequence level. These represent segments of the Brassica A genome as found in Brassica rapa and Brassica napus and the corresponding segments of the Brassica C genome as found in Brassica oleracea and B. napus. Analysis of synonymous base substitution rates within modeled genes revealed a relatively broad range of times (0.12 to 1.37 million years ago) since the divergence of orthologous genome segments as represented in B. napus and the diploid species. Similar, and consistent, ranges were also identified for single nucleotide polymorphism and insertion-deletion variation. Genes conserved across the Brassica genomes and the homoeologous segments of the genome of Arabidopsis thaliana showed almost perfect collinearity. Numerous examples of apparent transduplication of gene fragments, as previously reported in B. oleracea, were observed in B. rapa and B. napus, indicating that this phenomenon is widespread in Brassica species. In the majority of the regions studied, the C genome segments were expanded in size relative to their A genome counterparts. The considerable variation that we observed, even between the different versions of the same Brassica genome, for gene fragments and annotated putative genes suggest that the concept of the pan-genome might be particularly appropriate when considering Brassica genomes.  相似文献   

20.
Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its “pan-genome”. We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800–3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25–53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to link the distribution pattern of a specific phenotype to the presence/absence of specific sets of genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号