首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Brazelton WJ  Baross JA 《PloS one》2010,5(10):e13530

Background

The most widespread bacteria in oxic zones of carbonate chimneys at the serpentinite-hosted Lost City hydrothermal field, Mid-Atlantic Ridge, belong to the Thiomicrospira group of sulfur-oxidizing chemolithoautotrophs. It is unclear why Thiomicrospira-like organisms thrive in these chimneys considering that Lost City hydrothermal fluids are notably lacking in hydrogen sulfide and carbon dioxide.

Methodology/Principal Findings

Here we describe metagenomic sequences obtained from a Lost City carbonate chimney that are highly similar to the genome of Thiomicrospira crunogena XCL-2, an isolate from a basalt-hosted hydrothermal vent in the Pacific Ocean. Even though T. crunogena and Lost City Thiomicrospira inhabit different types of hydrothermal systems in different oceans, their genomic contents are highly similar. For example, sequences encoding the sulfur oxidation and carbon fixation pathways (including a carbon concentration mechanism) of T. crunogena are also present in the Lost City metagenome. Comparative genomic analyses also revealed substantial genomic changes that must have occurred since the divergence of the two lineages, including large genomic rearrangements, gene fusion events, a prophage insertion, and transposase activity.

Conclusions/Significance

Our results show significant genomic similarity between Thiomicrospira organisms inhabiting different kinds of hydrothermal systems in different oceans, suggesting that these organisms are widespread and highly adaptable. These data also indicate genomic processes potentially associated with the adaptation of these lineages into strikingly different habitats.  相似文献   

2.

Background

Multi Drug Resistant Tuberculosis (MDR TB) is a threat to global tuberculosis control. A significant fitness cost has been associated with DR strains from specific lineages. Evaluation of the influence of the competing drug susceptible strains on fitness of drug resistant strains may have an important bearing on understanding the spread of MDR TB. The aim of this study was to evaluate the fitness of MDR TB strains, from a TB endemic region of western India: Mumbai, belonging to 3 predominant lineages namely CAS, Beijing and MANU in the presence of drug susceptible strains from the same lineages.

Methodology

Drug susceptible strains from a single lineage were mixed with drug resistant strain, bearing particular non synonymous mutation (rpoB D516V; inhA, A16G; katG, S315T1/T2) from the same or different lineages. Fitness of M.tuberculosis (M.tb) strains was evaluated using the difference in growth rates obtained by using the CFU assay system.

Conclusion/Significance

While MANU were most fit amongst the drug susceptible strains of the 3 lineages, only Beijing MDR strains were found to grow in the presence of any of the competing drug susceptible strains. A disproportionate increase in Beijing MDR could be an alarm for an impending epidemic in this locale. In addition to particular non synonymous substitutions, the competing strains in an environment may impact the fitness of circulating drug resistant strains.  相似文献   

3.

Background

The phylum Nematoda is biologically diverse, including parasites of plants and animals as well as free-living taxa. Underpinning this diversity will be commensurate diversity in expressed genes, including gene sets associated specifically with evolution of parasitism.

Methods and Findings

Here we have analyzed the extensive expressed sequence tag data (available for 37 nematode species, most of which are parasites) and define over 120,000 distinct putative genes from which we have derived robust protein translations. Combined with the complete proteomes of Caenorhabditis elegans and Caenorhabditis briggsae, these proteins have been grouped into 65,000 protein families that in turn contain 40,000 distinct protein domains. We have mapped the occurrence of domains and families across the Nematoda and compared the nematode data to that available for other phyla. Gene loss is common, and in particular we identify nearly 5,000 genes that may have been lost from the lineage leading to the model nematode C. elegans. We find a preponderance of novelty, including 56,000 nematode-restricted protein families and 26,000 nematode-restricted domains. Mapping of the latest time-of-origin of these new families and domains across the nematode phylogeny revealed ongoing evolution of novelty. A number of genes from parasitic species had signatures of horizontal transfer from their host organisms, and parasitic species had a greater proportion of novel, secreted proteins than did free-living ones.

Conclusions

These classes of genes may underpin parasitic phenotypes, and thus may be targets for development of effective control measures.  相似文献   

4.

Background

The 16S rRNA gene is the gold standard in molecular surveys of bacterial and archaeal diversity, but it has the disadvantages that it is often multiple-copy, has little resolution below the species level and cannot be readily interpreted in an evolutionary framework. We compared the 16S rRNA marker with the single-copy, protein-coding rpoB marker by amplifying and sequencing both from a single soil sample. Because the higher genetic resolution of the rpoB gene prohibits its use as a universal marker, we employed consensus-degenerate primers targeting the Proteobacteria.

Methodology/Principal Findings

Pyrosequencing can be problematic because of the poor resolution of homopolymer runs. As these erroneous runs disrupt the reading frame of protein-coding sequences, removal of sequences containing nonsense mutations was found to be a valuable filter in addition to flowgram-based denoising. Although both markers gave similar estimates of total diversity, the rpoB marker revealed more species, requiring an order of magnitude fewer reads to obtain 90% of the true diversity. The application of population genetic methods was demonstrated on a particularly abundant sequence cluster.

Conclusions/Significance

The rpoB marker can be a complement to the 16S rRNA marker for high throughput microbial diversity studies focusing on specific taxonomic groups. Additional error filtering is possible and tests for recombination or selection can be employed.  相似文献   

5.

Background

Haloquadratum walsbyi represents up to 80 % of cells in NaCl-saturated brines worldwide, but is notoriously difficult to maintain under laboratory conditions. In order to establish the extent of genetic diversity in a natural population of this microbe, we screened a H. walsbyi enriched metagenomic fosmid library and recovered seven novel version of its cell-wall associated genomic island. The fosmid inserts were sequenced and analysed.

Results

The novel cell-wall associated islands delineated two major clades within H. walsbyi. The islands predominantly contained genes putatively involved in biosynthesis of surface layer, genes encoding cell surface glycoproteins and genes involved in envelope formation. We further found that these genes are maintained in the population and that the diversity of this region arises through homologous recombination but also through the action of mobile genetic elements, including viruses.

Conclusions

The population of H. walsbyi in the studied saltern brine is composed of numerous clonal lineages that differ in surface structures including the cell wall. This type of variation probably reflects a number of mechanisms that minimize the infection rate of predating viruses.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1794-8) contains supplementary material, which is available to authorized users.  相似文献   

6.

Background

The complexity of phosphoinositide signaling in higher eukaryotes is partly due to expansion of specific families and types of phosphoinositide kinases (PIKs) that can generate all phosphoinositides via multiple routes. This is particularly evident in the PI3Ks and PIPKs, and it is considered an evolutionary trait associated with metazoan diversification. Yet, there are limited comprehensive studies on the PIK repertoire of free living unicellular organisms.

Methodology/Principal Findings

We undertook a genome-wide analysis of putative PIK genes in two free living ciliated cells, Tetrahymena and Paramecium. The Tetrahymena thermophila and Paramecium tetraurelia genomes were probed with representative kinases from all families and types. Putative homologs were verified by EST, microarray and deep RNA sequencing database searches and further characterized for domain structure, catalytic efficiency, expression patterns and phylogenetic relationships. In total, we identified and characterized 22 genes in the Tetrahymena thermophila genome and 62 highly homologues genes in Paramecium tetraurelia suggesting a tight evolutionary conservation in the ciliate lineage. Comparison to the kinome of fungi reveals a significant expansion of PIK genes in ciliates.

Conclusions/Significance

Our study highlights four important aspects concerning ciliate and other unicellular PIKs. First, ciliate-specific expansion of PI4KIII-like genes. Second, presence of class I PI3Ks which, at least in Tetrahymena, are associated with a metazoan-type machinery for PIP3 signaling. Third, expansion of divergent PIPK enzymes such as the recently described type IV transmembrane PIPKs. Fourth, presence of possible type II PIPKs and presumably inactive PIKs (hence, pseudo-PIKs) not previously described. Taken together, our results provide a solid framework for future investigation of the roles of PIKs in ciliates and indicate that novel functions and novel regulatory pathways of phosphoinositides may be more widespread than previously thought in unicellular organisms.  相似文献   

7.

Background

Multidrug resistance is a critical factor in tuberculosis control. To gain better understanding of multidrug resistant tuberculosis in Brazil, a retrospective study was performed to compare genotypic diversity and drug resistance associated mutations in Mycobacterium tuberculosis isolates from a national reference center.

Methods and Findings

Ninety-nine multidrug resistant isolates from 12 Brazilian states were studied. Drug-resistance patterns were determined and the rpoB and katG genes were screened for mutations. Genotypic diversity was investigated by IS6110-RFLP and Luminex 47 spoligotyping. Mutations in rpoB and katG were seen in 91% and 93% of the isolates, respectively. Codon 315 katG mutations occurred in 82.8% of the isolates with a predominance of the Ser315Thr substitution. Twenty-five isolates were clustered in 11 groups with identical IS6110-RFLP patterns while 74 showed unique patterns with no association between mutation frequencies or susceptibility profiles. The most prevalent spoligotyping lineages were LAM (47%), T (17%) and Haarlen (12%). The Haarlen lineage showed a higher frequency of codon 516 rpoB mutations while codon 531 mutations prevailed in the other isolates.

Conclusions

Our data suggest that there were no major multidrug resistant M. tuberculosis strains transmitted among patients referred to the reference center, indicating an independent acquisition of resistance. In addition, drug resistance associated mutation profiles were well established among the main spoligotyping lineages found in these Brazilian multidrug resistant isolates, providing useful data for patient management and treatment.  相似文献   

8.

Background

ALKBH proteins, the homologs of Escherichia coli AlkB dioxygenase, constitute a direct, single-protein repair system, protecting cellular DNA and RNA against the cytotoxic and mutagenic activity of alkylating agents, chemicals significantly contributing to tumor formation and used in cancer therapy. In silico analysis and in vivo studies have shown the existence of AlkB homologs in almost all organisms. Nine AlkB homologs (ALKBH1–8 and FTO) have been identified in humans. High ALKBH levels have been found to encourage tumor development, questioning the use of alkylating agents in chemotherapy. The aim of this work was to assign biological significance to multiple AlkB homologs by characterizing their activity in the repair of nucleic acids in prokaryotes and their subcellular localization in eukaryotes.

Methodology and Findings

Bioinformatic analysis of protein sequence databases identified 1943 AlkB sequences with eight new AlkB subfamilies. Since Cyanobacteria and Arabidopsis thaliana contain multiple AlkB homologs, they were selected as model organisms for in vivo research. Using E. coli alkB mutant and plasmids expressing cyanobacterial AlkBs, we studied the repair of methyl methanesulfonate (MMS) and chloroacetaldehyde (CAA) induced lesions in ssDNA, ssRNA, and genomic DNA. On the basis of GFP fusions, we investigated the subcellular localization of ALKBHs in A. thaliana and established its mostly nucleo-cytoplasmic distribution. Some of the ALKBH proteins were found to change their localization upon MMS treatment.

Conclusions

Our in vivo studies showed highly specific activity of cyanobacterial AlkB proteins towards lesions and nucleic acid type. Subcellular localization and translocation of ALKBHs in A. thaliana indicates a possible role for these proteins in the repair of alkyl lesions. We hypothesize that the multiplicity of ALKBHs is due to their involvement in the metabolism of nucleo-protein complexes; we find their repair by ALKBH proteins to be economical and effective alternative to degradation and de novo synthesis.  相似文献   

9.

Background

Polyketides are a diverse group of biotechnologically important secondary metabolites that are produced by multi domain enzymes called polyketide synthases (PKS).

Methodology/Principal Findings

We have estimated frequencies of type I PKS (PKS I) – a PKS subgroup – in natural environments by using Hidden-Markov-Models of eight domains to screen predicted proteins from six metagenomic shotgun data sets. As the complex PKS I have similarities to other multi-domain enzymes (like those for the fatty acid biosynthesis) we increased the reliability and resolution of the dataset by maximum-likelihood trees. The combined information of these trees was then used to discriminate true PKS I domains from evolutionary related but functionally different ones. We were able to identify numerous novel PKS I proteins, the highest density of which was found in Minnesota farm soil with 136 proteins out of 183,536 predicted genes. We also applied the protocol to UniRef database to improve the annotation of proteins with so far unknown function and identified some new instances of horizontal gene transfer.

Conclusions/Significance

The screening approach proved powerful in identifying PKS I sequences in large sequence data sets and is applicable to many other protein families.  相似文献   

10.

Background

Hemorrhagic fever with renal syndrome (HFRS) is highly endemic in mainland China, and has extended from rural areas to cities recently. Beijing metropolis is a novel affected region, where the HFRS incidence seems to be diverse from place to place.

Methodology/Principal Findings

The spatial scan analysis based on geographical information system (GIS) identified three geo-spatial “hotspots” of HFRS in Beijing when the passive surveillance data from 2004 to 2006 were used. The Relative Risk (RR) of the three “hotspots” was 5.45, 3.57 and 3.30, respectively. The Phylogenetic analysis based on entire coding region sequence of S segment and partial L segment sequence of Seoul virus (SEOV) revealed that the SEOV strains circulating in Beijing could be classified into at least three lineages regardless of their host origins. Two potential recombination events that happened in lineage #1 were detected and supported by comparative phylogenetic analysis. The SEOV strains in different lineages and strains with distinct special amino acid substitutions for N protein were partially associated with different spatial clustered areas of HFRS.

Conclusion/Significance

Hotspots of HFRS were found in Beijing, a novel endemic region, where intervention should be enhanced. Our data suggested that the genetic variation and recombination of SEOV strains was related to the high risk areas of HFRS, which merited further investigation.  相似文献   

11.

Background

The bacterial genus Salmonella contains thousands of serotypes that infect humans or other hosts, causing mild gastroenteritis to potentially fatal systemic infections in humans. Pathogenically distinct Salmonella serotypes have been classified as individual species or as serological variants of merely one or two species, causing considerable confusion in both research and clinical settings. This situation reflects a long unanswered question regarding whether the Salmonella serotypes exist as discrete genetic clusters (natural species) of organisms or as phenotypic (e.g. pathogenic) variants of a single (or two) natural species with a continuous spectrum of genetic divergence among them. Our recent work, based on genomic sequence divergence analysis, has demonstrated that genetic boundaries exist among Salmonella serotypes, circumscribing them into clear-cut genetic clusters of bacteria.

Methodologies/Principal Findings

To further test the genetic boundary concept for delineating Salmonella into clearly defined natural lineages (e.g., species), we sampled a small subset of conserved genomic DNA sequences, i.e., the endonuclease cleavage sites that contain the highly conserved CTAG sequence such as TCTAGA for XbaI. We found that the CTAG-containing cleavage sequence profiles could be used to resolve the genetic boundaries as reliably and efficiently as whole genome sequence comparisons but with enormously reduced requirements for time and resources.

Conclusions

Profiling of CTAG sequence subsets reflects genetic boundaries among Salmonella lineages and can delineate these bacteria into discrete natural clusters.  相似文献   

12.
Li W  Wooley JC  Godzik A 《PloS one》2008,3(10):e3375

Background

The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods.

Methodology/Principal Findings

In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations.

Conclusion/Significance

Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project.  相似文献   

13.

Objective

The goal of this study was to identify mutations in 25 known causative genes in 47 unrelated Chinese families with cone-rod dystrophy (CORD).

Methods

Forty-seven probands from unrelated families with CORD were recruited. Genomic DNA prepared from leukocytes was analyzed by whole exome sequencing. Variants in the 25 genes were selected and then validated by Sanger sequencing.

Results

Fourteen potential pathogenic mutations, including nine novel and five known, were identified in 10 of the 47 families (21.28%). Homozygous, compound heterozygous, and hemizygous mutations were detected in three, four, or three families, respectively. The 14 mutations in the 10 families were distributed among CNGB3 (three families), PDE6C (two families), ABCA4 (one family), RPGRIP1 (one family), RPGR (two families), and CACNA1F (one family).

Conclusions

This study provides a brief view on mutation spectrum of the 25 genes in a Chinese cohort with CORD. Identification of novel mutations enriched our understanding of variations in these genes and their associated phenotypes. To our knowledge, this is the first systemic exome-sequencing analysis of all of the 25 CORD-associated genes.  相似文献   

14.

Background

Researchers sorely need markers and approaches for biodiversity exploration (both specimen linked and metagenomics) using the full potential of next generation sequencing technologies (NGST). Currently, most studies rely on expensive multiple tagging, PCR primer universality and/or the use of few markers, sometimes with insufficient variability.

Methodology/Principal Findings

We propose a novel approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome. It relies on the properties of metazoan mitogenomes for enrichment, on careful choice of the organisms to multiplex, as well as on the wide collection of accumulated mitochondrial reference datasets for post-sequencing sorting and identification instead of individual tagging. Multiple divergent organisms can be sequenced simultaneously, and their complete mitogenome obtained at a very low cost. We provide in silico testing of dataset assembly for a selected set of example datasets.

Conclusions/Significance

This approach generates large mitogenome datasets. These sequences are useful for phylogenetics, molecular identification and molecular ecology studies, and are compatible with all existing projects or available datasets based on mitochondrial sequences, such as the Barcode of Life project. Our method can yield sequences both from identified samples and metagenomic samples. The use of the same datasets for both kinds of studies makes for a powerful approach, especially since the datasets have a high variability even at species level, and would be a useful complement to the less variable 18S rDNA currently prevailing in metagenomic studies.  相似文献   

15.
16.

Background

Fungi are important pathogens but challenging to enumerate using next-generation sequencing because of low absolute abundance in many samples and high levels of fungal DNA from contaminating sources.

Results

Here, we analyze fungal lineages present in the human airway using an improved method for contamination filtering. We use DNA quantification data, which are routinely acquired during DNA library preparation, to annotate output sequence data, and improve the identification and filtering of contaminants. We compare fungal communities and bacterial communities from healthy subjects, HIV+ subjects, and lung transplant recipients, providing a gradient of increasing lung impairment for comparison. We use deep sequencing to characterize ribosomal rRNA gene segments from fungi and bacteria in DNA extracted from bronchiolar lavage samples and oropharyngeal wash. Comparison to clinical culture data documents improved detection after applying the filtering procedure.

Conclusions

We find increased representation of medically relevant organisms, including Candida, Cryptococcus, and Aspergillus, in subjects with increasingly severe pulmonary and immunologic deficits. We analyze covariation of fungal and bacterial taxa, and find that oropharyngeal communities rich in Candida are also rich in mitis group Streptococci, a community pattern associated with pathogenic polymicrobial biofilms. Thus, using this approach, it is possible to characterize fungal communities in the human respiratory tract more accurately and explore their interactions with bacterial communities in health and disease.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0487-y) contains supplementary material, which is available to authorized users.  相似文献   

17.

Background

The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.

Results

We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.

Conclusions

We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.  相似文献   

18.

Background

Micro (mi)RNAs are important regulators of plant development. Across plant lineages, Dicer-like 1 (DCL1) proteins process long ds-like structures to produce micro (mi) RNA duplexes in a stepwise manner. These miRNAs are incorporated into Argonaute (AGO) proteins and influence expression of RNAs that have sequence complementarity with miRNAs. Expression levels of AGOs are greatly regulated by plants in order to minimize unwarranted perturbations using miRNAs to target mRNAs coding for AGOs. AGOs may also have high promoter specificity-sometimes expression of AGO can be limited to just a few cells in a plant. Viral pathogens utilize various means to counter antiviral roles of AGOs including hijacking the host encoded miRNAs to target AGOs. Two host encoded miRNAs namely miR168 and miR403 that target AGOs have been described in the model plant Arabidopsis and such a mechanism is thought to be well conserved across plants because AGO sequences are well conserved.

Results

We show that the interaction between AGO mRNAs and miRNAs is species-specific due to the diversity in sequences of two miRNAs that target AGOs, sequence diversity among corresponding target regions in AGO mRNAs and variable expression levels of these miRNAs among vascular plants. We used miRNA sequences from 68 plant species representing 31 plant families for this analysis. Sequences of miR168 and miR403 are not conserved among plant lineages, but surprisingly they differ drastically in their sequence diversity and expression levels even among closely related plants. Variation in miR168 expression among plants correlates well with secondary structures/length of loop sequences of their precursors.

Conclusions

Our data indicates a complex AGO targeting interaction among plant lineages due to miRNA sequence diversity and sequences of miRNA targeting regions among AGO mRNAs, thus leading to the assumption that the perturbations by viruses that use host miRNAs to target antiviral AGOs can only be species-specific. We also show that rapid evolution and likely loss of expression of miR168 isoforms in tobacco is related to the insertion of MITE-like transposons between miRNA and miRNA* sequences, a possible mechanism showing how miRNAs are lost in few plant lineages even though other close relatives have abundantly expressing miRNAs.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1049) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.

Background

Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism''s DNA was observed in reads generated via DNA sequencing.

Methodology/Principal Findings

We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized.

Conclusions/Significance

We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different protocols are not suitable for comparative metagenomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号