首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.  相似文献   

4.
SUMMARY: Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. AVAILABILITY: The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.  相似文献   

5.
Viruses are the most abundant biological entities on the planet and play an important role in balancing microbes within an ecosystem and facilitating horizontal gene transfer. Although bacteriophages are abundant in rumen environments, little is known about the types of viruses present or their interaction with the rumen microbiome. We undertook random pyrosequencing of virus-enriched metagenomes (viromes) isolated from bovine rumen fluid and analysed the resulting data using comparative metagenomics. A high level of diversity was observed with up to 28,000 different viral genotypes obtained from each environment. The majority (~78%) of sequences did not match any previously described virus. Prophages outnumbered lytic phages approximately 2:1 with the most abundant bacteriophage and prophage types being associated with members of the dominant rumen phyla (Firmicutes and Proteobacteria). Metabolic profiling based on SEED subsystems revealed an enrichment of sequences with putative functional roles in DNA and protein metabolism, but a surprisingly low proportion of sequences assigned to carbohydrate and amino acid metabolism. We expanded our analysis to include previously described metagenomic data and 14 reference genomes. Clustered regularly interspaced short palindromic repeats (CRISPR) were detected in most of the microbial genomes, suggesting previous interactions between viral and microbial communities.  相似文献   

6.
While current major national research efforts (i.e., the NIH Human Microbiome Project) will enable comprehensive metagenomic characterization of the adult human microbiota, how and when these diverse microbial communities take up residence in the host and during reproductive life are unexplored at a population level. Because microbial abundance and diversity might differ in pregnancy, we sought to generate comparative metagenomic signatures across gestational age strata. DNA was isolated from the vagina (introitus, posterior fornix, midvagina) and the V5V3 region of bacterial 16S rRNA genes were sequenced (454FLX Titanium platform). Sixty-eight samples from 24 healthy gravidae (18 to 40 confirmed weeks) were compared with 301 non-pregnant controls (60 subjects). Generated sequence data were quality filtered, taxonomically binned, normalized, and organized by phylogeny and into operational taxonomic units (OTU); principal coordinates analysis (PCoA) of the resultant beta diversity measures were used for visualization and analysis in association with sample clinical metadata. Altogether, 1.4 gigabytes of data containing >2.5 million reads (averaging 6,837 sequences/sample of 493 nt in length) were generated for computational analyses. Although gravidae were not excluded by virtue of a posterior fornix pH >4.5 at the time of screening, unique vaginal microbiome signature encompassing several specific OTUs and higher-level clades was nevertheless observed and confirmed using a combination of phylogenetic, non-phylogenetic, supervised, and unsupervised approaches. Both overall diversity and richness were reduced in pregnancy, with dominance of Lactobacillus species (L. iners crispatus, jensenii and johnsonii, and the orders Lactobacillales (and Lactobacillaceae family), Clostridiales, Bacteroidales, and Actinomycetales. This intergroup comparison using rigorous standardized sampling protocols and analytical methodologies provides robust initial evidence that the vaginal microbial 16S rRNA gene catalogue uniquely differs in pregnancy, with variance of taxa across vaginal subsite and gestational age.  相似文献   

7.
MOTIVATION: Linking gene mentions in an article to entries of biological databases can facilitate indexing and querying biological literature greatly. Due to the high ambiguity of gene names, this task is particularly challenging. Manual annotation for this task is cost expensive, time consuming and labor intensive. Therefore, providing assistive tools to facilitate the task is of high value. RESULTS: We developed GeneTUKit, a document-level gene normalization software for full-text articles. This software employs both local context surrounding gene mentions and global context from the whole full-text document. It can normalize genes of different species simultaneously. When participating in BioCreAtIvE III, the system obtained good results among 37 runs: the system was ranked first, fourth and seventh in terms of TAP-20, TAP-10 and TAP-5, respectively on the 507 full-text test articles. Availability and implementation: The software is available at http://www.qanswers.net/GeneTUKit/.  相似文献   

8.
9.
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive, efficient and reproducible technique for studying gene expression. Identification of stably expressed reference genes is required to avoid bias in these studies yet mostly unvalidated reference genes are used in studying gene expression in Clostridium difficile. Here, we sought to identify a set of stable reference genes used to normalize C. difficile expression data comparing exponential versus stationary phases of growth. Eight candidate reference genes (rpoA, rrs, gyrA, gluD, adk, rpsJ, tpi, and rho) were assessed in 3 C. difficile genotypes (ribotypes 027, 078, and 001). The primers were analyzed for efficiency and the 8 genes were ranked according to their stability. Overall, the genes rrs, adk, and rpsJ ranked among the most stable. Identification of the most stable genes was, however, strain dependent and suggests that selection of reference genes in a heterogeneous species, such as C. difficile, requires multiple genes to be assessed to confirm their stability within the strains being studied.  相似文献   

10.
11.
Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of 'correct' assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/  相似文献   

12.

Background

Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample.

Result

Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities.

Conclusion

Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms.

Reviewers

This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.
  相似文献   

13.
Decisions guiding environmental management need to be based on a broad and comprehensive understanding of the biodiversity and functional capability within ecosystems. Microbes are of particular importance since they drive biogeochemical cycles, being both producers and decomposers. Their quick and direct responses to changes in environmental conditions modulate the ecosystem accordingly, thus providing a sensitive readout. Here we have used direct sequencing of total DNA from water samples to compare the microbial communities of two distinct coastal regions exposed to different anthropogenic pressures: the highly polluted Port of Genoa and the protected area of Montecristo Island in the Mediterranean Sea. Analysis of the metagenomes revealed significant differences in both microbial diversity and abundance between the two areas, reflecting their distinct ecological habitats and anthropogenic stress conditions. Our results indicate that the combination of next generation sequencing (NGS) technologies and bioinformatics tools presents a new approach to monitor the diversity and the ecological status of aquatic ecosystems. Integration of metagenomics into environmental monitoring campaigns should enable the impact of the anthropogenic pressure on microbial biodiversity in various ecosystems to be better assessed and also predicted.  相似文献   

14.

Background  

Assessment of gene expression is an important component of osteoarthritis (OA) research, greatly improved by the development of quantitative real-time PCR (qPCR). This technique requires normalization for precise results, yet no suitable reference genes have been identified in human articular cartilage. We have examined ten well-known reference genes to determine the most adequate for this application.  相似文献   

15.
Selectable marker gene systems are vital for the development of transgenic crops. Since the creation of the first transgenic plants in the early 1980s and their subsequent commercialization worldwide over almost an entire decade, antibiotic and herbicide resistance selectable marker gene systems have been an integral feature of plant genetic modification. Without them, creating transgenic crops is not feasible on purely economic and practical terms. These systems allow the relatively straightforward identification and selection of plants that have stably incorporated not only the marker genes but also genes of interest, for example herbicide tolerance and pest resistance. Bacterial antibiotic resistance genes are also crucial in molecular biology manipulations in the laboratory. An unprecedented debate has accompanied the development and commercialization of transgenic crops. Divergent policies and their implementation in the European Union on one hand and the rest of the world on the other (industrialized and developing countries alike), have resulted in disputes with serious consequences on agricultural policy, world trade and food security. A lot of research effort has been directed towards the development of marker-free transformation or systems to remove selectable markers. Such research has been in a large part motivated by perceived problems with antibiotic resistance selectable markers; however, it is not justified from a safety point of view. The aim of this review is to discuss in some detail the currently available scientific evidence that overwhelmingly argues for the safety of these marker gene systems. Our conclusion, supported by numerous studies, most of which are commissioned by some of the very parties that have taken a position against the use of antibiotic selectable marker gene systems, is that there is no scientific basis to argue against the use and presence of selectable marker genes as a class in transgenic plants.  相似文献   

16.
17.
Microarrays measure values that are approximately proportional to the numbers of copies of different mRNA molecules in samples. Due to technical difficulties, the constant of proportionality between the measured intensities and the numbers of mRNA copies per cell is unknown and may vary for different arrays. Usually, the data are normalized (i.e., array-wise multiplied by appropriate factors) in order to compensate for this effect and to enable informative comparisons between different experiments. Centralization is a new two-step method for the computation of such normalization factors that is both biologically better motivated and more robust than standard approaches. First, for each pair of arrays the quotient of the constants of proportionality is estimated. Second, from the resulting matrix of pairwise quotients an optimally consistent scaling of the samples is computed.  相似文献   

18.
A method for mapping complex trait genes using cDNA microarray and molecular marker data jointly is presented and illustrated via simulation. We introduce a novel approach for simulating phenotypes and genotypes conditionally on real, publicly available, microarray data. The model assumes an underlying continuous latent variable (liability) related to some measured cDNA expression levels. Partial least-squares logistic regression is used to estimate the liability under several scenarios where the level of gene interaction, the gene effect, and the number of cDNA levels affecting liability are varied. The results suggest that: (1) the usefulness of microarray data for gene mapping increases when both the number of cDNA levels in the underlying liability and the QTL effect decrease and when genes are coexpressed; (2) the correlation between estimated and true liability is large, at least under our simulation settings; (3) it is unlikely that cDNA clones identified as significant with partial least squares (or with some other technique) are the true responsible cDNAs, especially as the number of clones in the liability increases; (4) the number of putatively significant cDNA levels increases critically if cDNAs are coexpressed in a cluster (however, the proportion of true causal cDNAs within the significant ones is similar to that in a no-coexpression scenario); and (5) data reduction is needed to smooth out the variability encountered in expression levels when these are analyzed individually.  相似文献   

19.

Background  

In two-channel competitive genomic hybridization microarray experiments, the ratio of the two fluorescent signal intensities at each spot on the microarray is commonly used to infer the relative amounts of the test and reference sample DNA levels. This ratio may be influenced by systematic measurement effects from non-biological sources that can introduce biases in the estimated ratios. These biases should be removed before drawing conclusions about the relative levels of DNA. The performance of existing gene expression microarray normalization strategies has not been evaluated for removing systematic biases encountered in array-based comparative genomic hybridization (CGH), which aims to detect single copy gains and losses typically in samples with heterogeneous cell populations resulting in only slight shifts in signal ratios. The purpose of this work is to establish a framework for correcting the systematic sources of variation in high density CGH array images, while maintaining the true biological variations.  相似文献   

20.
Housekeeping genes are widely used as internal controls for gene expression normalization for western blotting, northern blotting, RT-PCR, etc. They are generally thought to be expressed in all cells of the organism at similar levels because it is assumed that these genes are required for the maintenance of basic cellular function as constitutive genes. However, real-time RT-PCR experiments revealed that their expression may vary depending on the developmental stage, type of tissue examined, experimental condition, and so on. To date, no histological data on their expression are available for embryonic development. In the present study, we compared the histological expression profile of two commonly used housekeeping genes, GAPDH and beta-actin, in the developing chicken embryo by using section and whole mount in situ hybridization supplemented by RT-PCR. Our results show that neither GAPDH mRNA nor beta-actin mRNA is expressed in all cell types or tissues at high levels. Strikingly, expression levels are very low in some organs. Moreover, the two genes show partially complementary expression patterns in the liver, the vascular system and the digestive tract. For example, GAPDH is more strongly expressed in the liver than beta-actin, but at lower levels in the arteries. Vice versa, beta-actin is more strongly expressed in the gizzard than GAPDH, but it is almost absent from cardiac muscle cells. Researchers should consider these histological results when using GAPGD and beta-actin for gene expression normalization in their experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号