首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT: BACKGROUND: Oenococcus oeni, a member of the lactic acid bacteria, is one of a limited number of microorganisms that not only survive, but actively proliferate in wine. It is is also unusual as, unlike the majority of bacteria present in wine, it is beneficial to wine quality rather than causing spoilage. These benefits are realised primarily through catalysing malolactic fermentation, but also through imparting other positive sensory properties. However, many of these industrially-important secondary attributes have been shown to be strain-dependent and their genetic basis it yet to be determined. RESULTS: In order to investigate the scale and scope of genetic variation in O. oeni, we have performed whole-genome sequencing on eleven strains of this bacterium, bringing the total number of strains for which genome sequences are available to fourteen. While any single strain of O. oeni was shown to contain around 1800 protein-coding genes, in-depth comparative annotation based on genomic synteny and protein orthology identified over 2800 orthologous open reading frames that comprise the pan genome of this species, and less than 1200 genes that make up the conserved genomic core present in all of the strains. The expansion of the pan genome relative to the coding potential of individual strains was shown to be due to the varied presence and location of multiple distinct bacteriophage sequences and also in various metabolic functions with potential impacts on the industrial performance of this species, including cell wall exopolysaccharide biosynthesis, sugar transport and utilisation and amino acid biosynthesis. CONCLUSIONS: By providing a large cohort of sequenced strains, this study provides a broad insight into the genetic variation present within O. oeni. This data is vital to understanding and harnessing the phenotypic variation present in this economically-important species.  相似文献   

2.
Advanced resources for genome‐assisted research in barley (Hordeum vulgare) including a whole‐genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole‐genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA‐coding exome reduces barley genomic complexity more than 50‐fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in‐solution hybridization‐based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full‐length cDNAs and de novo assembled RNA‐Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA‐coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping‐by‐sequencing and genetic diversity analyzes.  相似文献   

3.
The development of next-generation sequencing platforms is set to reveal an unprecedented level of detail on short-term molecular evolutionary processes in bacteria. Here we re-analyse genome-wide single nucleotide polymorphism (SNP) datasets for recently emerged clones of methicillin resistant Staphylococcus aureus (MRSA) and Clostridium difficile. We note a highly significant enrichment of synonymous SNPs in those genes which have been affected by recombination, i.e. those genes on mobile elements designated "non-core" (in the case of S. aureus), or those core genes which have been affected by homologous replacements (S. aureus and C. difficile). This observation suggests that the previously documented decrease in dN/dS over time in bacteria applies not only to genomes of differing levels of divergence overall, but also to horizontally acquired genes of differing levels of divergence within a single genome. We also consider the role of increased drift acting on recently emerged, highly specialised clones, and the impact of recombination on selection at linked sites. This work has implications for a wide range of genomic analyses.  相似文献   

4.
基因组注释是识别出基因组序列中功能组件的过程,其可以直接对序列赋予生物学意义,由此方便研究者探究和分析基因组功能.基因组注释可以帮助研究从三个层次上理解基因组,一种是在核苷酸水平的注释,主要确定DNA序列中基因、RNA、重复序列等组件的物理位置,包括转录起始,翻译起始,外显子边界等具体位置信息.同时可以注释得到变异在不...  相似文献   

5.
Autism spectrum disorders (ASD) are a group of related neurodevelopmental disorders with significant combined prevalence (~1%) and high heritability. Dozens of individually rare genes and loci associated with high-risk for ASD have been identified, which overlap extensively with genes for intellectual disability (ID). However, studies indicate that there may be hundreds of genes that remain to be identified. The advent of inexpensive massively parallel nucleotide sequencing can reveal the genetic underpinnings of heritable complex diseases, including ASD and ID. However, whole exome sequencing (WES) and whole genome sequencing (WGS) provides an embarrassment of riches, where many candidate variants emerge. It has been argued that genetic variation for ASD and ID will cluster in genes involved in distinct pathways and protein complexes. For this reason, computational methods that prioritize candidate genes based on additional functional information such as protein-protein interactions or association with specific canonical or empirical pathways, or other attributes, can be useful. In this study we applied several supervised learning approaches to prioritize ASD or ID disease gene candidates based on curated lists of known ASD and ID disease genes. We implemented two network-based classifiers and one attribute-based classifier to show that we can rank and classify known, and predict new, genes for these neurodevelopmental disorders. We also show that ID and ASD share common pathways that perturb an overlapping synaptic regulatory subnetwork. We also show that features relating to neuronal phenotypes in mouse knockouts can help in classifying neurodevelopmental genes. Our methods can be applied broadly to other diseases helping in prioritizing newly identified genetic variation that emerge from disease gene discovery based on WES and WGS.  相似文献   

6.
Chromosome rearrangements associated with neoplasms provide a rich resource for definition of the pathways of tumorigenesis. The power of comparative genome hybridization (CGH) to identify novel genes depends on the existence of suitable markers, which are lacking throughout most of the genome. We now report a general approach that translates CGH data into higher-resolution genomic-clone data that are then used to define the genes located in aneuploid regions. We used CGH to study 33 thyroid-tumor DNAs and two tumor-cell-line DNAs. The results revealed amplifications of chromosome band 2p21, with less-intense amplification on 2p13, 19q13.1, and 1p36 and with least-intense amplification on 1p34, 1q42, 5q31, 5q33-34, 9q32-34, and 14q32. To define the 2p21 region amplified, a dense array of 373 FISH-mapped chromosome 2 bacterial artificial chromosomes (BACs) was constructed, and 87 of these were hybridized to a tumor-cell line. Four BACs carried genomic DNA that was amplified in these cells. The maximum amplified region was narrowed to 3-6 Mb by multicolor FISH with the flanking BACs, and the minimum amplicon size was defined by a contig of 420 kb. Sequence analysis of the amplified BAC 1D9 revealed a fragment of the gene, encoding protein kinase C epsilon (PKCepsilon), that was then shown to be amplified and rearranged in tumor cells. In summary, CGH combined with a dense mapped resource of BACs and large-scale sequencing has led directly to the definition of PKCepsilon as a previously unmapped candidate gene involved in thyroid tumorigenesis.  相似文献   

7.
Escherichia coli, including the closely related genus Shigella, is a highly diverse species in terms of genome structure. Comparative genomic hybridization (CGH) microarray analysis was used to compare the gene content of E. coli K-12 with the gene contents of pathogenic strains. Missing genes in a pathogen were detected on a microarray slide spotted with 4,071 open reading frames (ORFs) of W3110, a commonly used wild-type K-12 strain. For 22 strains subjected to the CGH microarray analyses 1,424 ORFs were found to be absent in at least one strain. The common backbone of the E. coli genome was estimated to contain about 2,800 ORFs. The mosaic distribution of absent regions indicated that the genomes of pathogenic strains were highly diversified because of insertions and deletions. Prophages, cell envelope genes, transporter genes, and regulator genes in the K-12 genome often were not present in pathogens. The gene contents of the strains tested were recognized as a matrix for a neighbor-joining analysis. The phylogenic tree obtained was consistent with the results of previous studies. However, unique relationships between enteroinvasive strains and Shigella, uropathogenic, and some enteropathogenic strains were suggested by the results of this study. The data demonstrated that the CGH microarray technique is useful not only for genomic comparisons but also for phylogenic analysis of E. coli at the strain level.  相似文献   

8.
Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data.  相似文献   

9.
10.
The α-proteobacteria represent one of the most diverse bacterial subdivisions, displaying extreme variations in lifestyle, geographical distribution and genome size. Species for which genome data are available have been classified into a species tree based on a conserved set of vertically inherited core genes. By mapping the variation in gene content onto the species tree, genomic changes can be associated with adaptations to specific growth niches. Genes for adaptive traits are mostly located in ‘plasticity zones’ in the bacterial genome, which also contain mobile elements and are highly variable across strains. By physically separating genes for information processing from genes involved in interactions with the surrounding environment, the rate of evolutionary change can be substantially enhanced for genes underlying adaptation to new growth habitats, possibly explaining the ecological success of the α-proteo-bacterial subdivision.  相似文献   

11.
Array-based comparative genomics hybridization (aCGH) has gained prevalence as an effective technique for measuring structural variations in the genome. Copy-number variations (CNVs) form a large source of genomic structural variation, but it is not known whether phenotypic differences between intra-species groups, such as divergent human populations, or breeds of a domestic animal, can be attributed to CNVs. Several computational methods have been proposed to improve the detection of CNVs from array CGH data, but few population studies have used CGH data for identification of intra-species differences. In this paper we propose a novel method of genome-wide comparison and classification using CGH data that condenses whole genome information, aimed at quantification of intra-species variations and discovery of shared ancestry. Our strategy included smoothing CGH data using an appropriate denoising algorithm, extracting features via wavelets, quantifying the information via wavelet power spectrum and hierarchical clustering of the resultant profile. To evaluate the classification efficiency of our method, we used simulated data sets. We applied it to aCGH data from human and bovine individuals and showed that it successfully detects existing intra-specific variations with additional evolutionary implications.  相似文献   

12.

Background

Molecular alterations critical to development of cancer include mutations, copy number alterations (amplifications and deletions) as well as genomic rearrangements resulting in gene fusions. Massively parallel next generation sequencing, which enables the discovery of such changes, uses considerable quantities of genomic DNA (> 5 ug), a serious limitation in ever smaller clinical samples. However, a commonly available microarray platforms such as array comparative genomic hybridization (array CGH) allows the characterization of gene copy number at a single gene resolution using much smaller amounts of genomic DNA. In this study we evaluate the sensitivity of ultra-dense array CGH platforms developed by Agilent, especially that of the 1 million probe array (1 M array), and their application when whole genome amplification is required because of limited sample quantities.

Methods

We performed array CGH on whole genome amplified and not amplified genomic DNA from MCF-7 breast cancer cells, using 244 K and 1 M Agilent arrays. The ADM-2 algorithm was used to identify micro-copy number alterations that measured less than 1 Mb in genomic length.

Results

DNA from MCF-7 breast cancer cells was analyzed for micro-copy number alterations, defined as measuring less than 1 Mb in genomic length. The 4-fold extra resolution of the 1 M array platform relative to the less dense 244 K array platform, led to the improved detection of copy number variations (CNVs) and micro-CNAs. The identification of intra-genic breakpoints in areas of DNA copy number gain signaled the possible presence of gene fusion events. However, the ultra-dense platforms, especially the densest 1 M array, detect artifacts inherent to whole genome amplification and should be used only with non-amplified DNA samples.

Conclusions

This is a first report using 1 M array CGH for the discovery of cancer genes and biomarkers. We show the remarkable capacity of this technology to discover CNVs, micro-copy number alterations and even gene fusions. However, these platforms require excellent genomic DNA quality and do not tolerate relatively small imperfections related to the whole genome amplification.  相似文献   

13.
The aim of this investigation was to exploit the vast comparative data generated by comparative genome hybridization (CGH) studies of Campylobacter jejuni in developing a genotyping method. We examined genes in C. jejuni that exhibit binary status (present or absent between strains) within known plasticity regions, in order to identify a minimal subset of gene targets that provide high-resolution genetic fingerprints. Using CGH data from three studies as input, binary gene sets were identified with "Minimum SNPs" software. "Minimum SNPs" selects for the minimum number of targets required to obtain a predefined resolution, based on Simpson's index of diversity (D). After implementation of stringent criteria for gene presence/absence, eight binary genes were found that provided 100% resolution (D=1) of 20 C. jejuni strains. A real-time PCR assay was developed and tested on 181 C. jejuni and Campylobacter coli isolates, a subset of which have previously been characterized by multilocus sequence typing, flaA short variable region sequencing, and pulsed-field gel electrophoresis. In addition to the binary gene real-time PCR assay, we refined the seven-member single nucleotide polymorphism (SNP) real-time PCR assay previously described for C. jejuni and C. coli. By normalizing the SNP assay with the respective C. jejuni and C. coli ubiquitous genes, mapA and ceuE, the polymorphisms at each SNP could be determined without separate reactions for every polymorphism. We have developed and refined a rapid, highly discriminatory genotyping method for C. jejuni and C. coli that uses generic technology and is amenable to high-throughput analyses.  相似文献   

14.
Staphylococcus aureus causes disease in humans and a wide array of animals. Of note, S. aureus mastitis of ruminants, including cows, sheep, and goats, results in major economic losses worldwide. Extensive variation in genome content exists among S. aureus pathogenic clones. However, the genomic variation among S. aureus strains infecting different animal species has not been well examined. To investigate variation in the genome content of human and ruminant S. aureus, we carried out whole-genome PCR scanning (WGPS), comparative genomic hybridizations (CGH), and the directed DNA sequence analysis of strains of human, bovine, ovine, and caprine origin. Extensive variation in genome content was discovered, including host- and ruminant-specific genetic loci. Ovine and caprine strains were genetically allied, whereas bovine strains were heterogeneous in gene content. As expected, mobile genetic elements such as pathogenicity islands and bacteriophages contributed to the variation in genome content between strains. However, differences specific for ruminant strains were restricted to regions of the conserved core genome, which contained allelic variation in genes encoding proteins of known and unknown function. Many of these proteins are predicted to be exported and could play a role in host-pathogen interactions. The genomic regions of difference identified by the whole-genome approaches adopted in the current study represent excellent targets for studies of the molecular basis of S. aureus host adaptation.  相似文献   

15.
Multiple disease resistance has important implications for plant fitness, given the selection pressure that many pathogens exert directly on natural plant populations and indirectly via crop improvement programs. Evidence of a locus conditioning resistance to multiple pathogens was found in bin 1.06 of the maize genome with the allele from inbred line “Tx303” conditioning quantitative resistance to northern leaf blight (NLB) and qualitative resistance to Stewart’s wilt. To dissect the genetic basis of resistance in this region and to refine candidate gene hypotheses, we mapped resistance to the two diseases. Both resistance phenotypes were localized to overlapping regions, with the Stewart’s wilt interval refined to a 95.9-kb segment containing three genes and the NLB interval to a 3.60-Mb segment containing 117 genes. Regions of the introgression showed little to no recombination, suggesting structural differences between the inbred lines Tx303 and “B73,” the parents of the fine-mapping population. We examined copy number variation across the region using next-generation sequencing data, and found large variation in read depth in Tx303 across the region relative to the reference genome of B73. In the fine-mapping region, association mapping for NLB implicated candidate genes, including a putative zinc finger and pan1. We tested mutant alleles and found that pan1 is a susceptibility gene for NLB and Stewart’s wilt. Our data strongly suggest that structural variation plays an important role in resistance conditioned by this region, and pan1, a gene conditioning susceptibility for NLB, may underlie the QTL.  相似文献   

16.
17.
Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.  相似文献   

18.
19.
20.
Phase variation, mediated through variation in the length of simple sequence repeats, is recognized as an important mechanism for controlling the expression of factors involved in bacterial virulence. Phase variation is associated with most of the currently recognized virulence determinants of Neisseria meningitidis. Based upon the complete genome sequence of the N. meningitidis serogroup B strain MC58, we have identified tracts of potentially unstable simple sequence repeats and their potential functional significance determined on the basis of sequence context. Of the 65 potentially phase variable genes identified, only 13 were previously recognized. Comparison with the sequences from the other two pathogenic Neisseria sequencing projects shows differences in the length of the repeats in 36 of the 65 genes identified, including 25 of those not previously known to be phase variable. Six genes that did not have differences in the length of the repeat instead had polymorphisms such that the gene would not be expected to be phase variable in at least one of the other strains. A further 12 candidates did not have homologues in either of the other two genome sequences. The large proportion of these genes that are associated with frameshifts and with differences in repeat length between the neisserial genome sequences is further corroborative evidence that they are phase variable. The number of potentially phase variable genes is substantially greater than for any other species studied to date, and would allow N. meningitidis to generate a very large repertoire of phenotypes through expression of these genes in different combinations. Novel phase variable candidates identified in the strain MC58 genome sequence include a spectrum of genes encoding glycosyltransferases, toxin related products, and metabolic activities as well as several restriction/modification and bacteriocin-related genes and a number of open reading frames (ORFs) for which the function is currently unknown. This suggests that the potential role of phase variation in mediating bacterium-host interactions is much greater than has been appreciated to date. Analysis of the distribution of homopolymeric tract lengths indicates that this species has sequence-specific mutational biases that favour the instability of sequences associated with phase variation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号