首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Massive DNA sequencing studies have expanded our insights and understanding of the ecological and functional characteristics of the gut microbiome. Advanced sequencing technologies allow us to understand the close association of the gut microbiome with human health and critical illnesses. In the future, analyses of the gut microbiome will provide key information associating with human individual health, which will help provide personalized health care for diseases. Numerous molecular biological analysis tools have been rapidly developed and employed for the gut microbiome researches; however, methodological differences among researchers lead to inconsistent data, limiting extensive share of data. It is therefore very essential to standardize the current methodologies and establish appropriate pipelines for human gut microbiome research. Herein, we review the methods and procedures currently available for studying the human gut microbiome, including fecal sample collection, metagenomic DNA extraction, massive DNA sequencing, and data analyses with bioinformatics. We believe that this review will contribute to the progress of gut microbiome research in the clinical and practical aspects of human health.  相似文献   

2.
Habitat thresholds are usually defined as “points of abrupt change” in the species–habitat relationships. Habitat thresholds can be a key tool for understanding species requirements, and provide an objective definition of conservation targets, by identifying when habitat loss leads to a rapid loss of species, and the minimum amount of habitat necessary for species persistence. However, a large variety of statistical methods have been used to analyse them. In this context, we reviewed these methods and, using simulated data sets, we tested the main models to compare their performance on the identification of thresholds. We show that researchers use very different analytical tools, corresponding to different operational definitions of habitat thresholds, which can considerably affect their detection. Piecewise regression and generalized additive models allow both the distinction between linear and nonlinear dynamics, and the correct identification of break point position. In contrast, other methods such as logistic regression fail because they may incorrectly detect thresholds in gradual patterns, or they may over or underestimate the threshold position. In conservation or habitat modelling, it is important to focus efforts efficiently and the inappropriate choice of statistical methods may have detrimental consequences.  相似文献   

3.
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.  相似文献   

4.
Confirming microarray data--is it really necessary?   总被引:2,自引:0,他引:2  
Rockett JC  Hellmann GM 《Genomics》2004,83(4):541-549
The generation of corroborative data has become a commonly used approach for ensuring the veracity of microarray data. Indeed, the need to conduct corroborative studies has now become official editorial policy for at least 2 journals, and several more are considering introducing such a policy. The issue of corroborating microarray data is a challenging one-there are good arguments for and against conducting such experiments. However, we believe that the introduction of a fixed requirement to corroborate microarray data, especially if adopted by more journals, is overly burdensome and may, in at least several applications of microarray technology, be inappropriate. We also believe that, in cases in which corroborative studies are deemed essential, a lack of clear guidance leaves researchers unclear as to what constitutes an acceptable corroborative study. Guidelines have already been outlined regarding the details of conducting microarray experiments. We propose that all stakeholders, including journal editorial boards, reviewers, and researchers, should undertake concerted and inclusive efforts to address properly and clarify the specific issue of corroborative data. In this article we highlight some of the thorny and vague areas for discussion surrounding this issue. We also report the results of a poll in which 76 life science journals were asked about their current or intended policies on the inclusion of corroborative studies in papers containing microarray data.  相似文献   

5.
Advances in high-throughput sequencing(HTS)have fostered rapid developments in the field of microbiome research,and massive microbiome datasets are now being generated.However,the diversity of software tools and the complexity of analysis pipelines make it difficult to access this field.Here,we systematically summarize the advantages and limitations of micro-biome methods.Then,we recommend specific pipelines for amplicon and metagenomic analyses,and describe commonly-used software and databases,to help researchers select the appropriate tools.Furthermore,we introduce statistical and visualization methods suit-able for microbiome analysis,including alpha-and beta-diversity,taxonomic composition,difference compar-isons,correlation,networks,machine learning,evolu-tion,source tracing,and common visualization styles to help researchers make informed choices.Finally,a step-by-step reproducible analysis guide is introduced.We hope this review will allow researchers to carry out data analysis more effectively and to quickly select the appropriate tools in order to efficiently mine the bio-logical significance behind the data.  相似文献   

6.
Designers have a saying that "the joy of an early release lasts but a short time. The bitterness of an unusable system lasts for years." It is indeed disappointing to discover that your data resources are not being used to their full potential. Not only have you invested your time, effort, and research grant on the project, but you may face costly redesigns if you want to improve the system later. This scenario would be less likely if the product was designed to provide users with exactly what they need, so that it is fit for purpose before its launch. We work at EMBL-European Bioinformatics Institute (EMBL-EBI), and we consult extensively with life science researchers to find out what they need from biological data resources. We have found that although users believe that the bioinformatics community is providing accurate and valuable data, they often find the interfaces to these resources tricky to use and navigate. We believe that if you can find out what your users want even before you create the first mock-up of a system, the final product will provide a better user experience. This would encourage more people to use the resource and they would have greater access to the data, which could ultimately lead to more scientific discoveries. In this paper, we explore the need for a user-centred design (UCD) strategy when designing bioinformatics resources and illustrate this with examples from our work at EMBL-EBI. Our aim is to introduce the reader to how selected UCD techniques may be successfully applied to software design for bioinformatics.  相似文献   

7.
Gut microbiome community analysis is used to understand many diseases like inflammatory bowel disease, obesity, and diabetes. Sampling methods are an important consideration for human microbiome research, yet are not emphasized in many studies. In this study, we demonstrate that the preparation, handling, and storage of human faeces are critical processes that alter the outcomes of downstream DNA-based bacterial community analyses via qPCR. We found that stool subsampling resulted in large variability of gut microbiome data due to different microenvironments harbouring various taxa within an individual stool. However, we reduced intra-sample variability by homogenizing the entire stool sample in liquid nitrogen and subsampling from the resulting crushed powder prior to DNA extraction. We experimentally determined that the bacterial taxa varied with room temperature storage beyond 15 minutes and beyond three days storage in a domestic frost-free freezer. While freeze thawing only had an effect on bacterial taxa abundance beyond four cycles, the use of samples stored in RNAlater should be avoided as overall DNA yields were reduced as well as the detection of bacterial taxa. Overall we provide solutions for processing and storing human stool samples that reduce variability of microbiome data. We recommend that stool is frozen within 15 minutes of being defecated, stored in a domestic frost-free freezer for less than three days, and homogenized prior to DNA extraction. Adoption of these simple protocols will have a significant and positive impact on future human microbiome research.  相似文献   

8.
Microarray experiments can generate enormous amounts of data, but large datasets are usually inherently complex, and the relevant information they contain can be difficult to extract. For the practicing biologist, we provide an overview of what we believe to be the most important issues that need to be addressed when dealing with microarray data. In a microarray experiment we are simply trying to identify which genes are the most "interesting" in terms of our experimental question, and these will usually be those that are either overexpressed or underexpressed (upregulated or downregulated) under the experimental conditions. Analysis of the data to find these genes involves first preprocessing of the raw data for quality control, including filtering of the data (e.g., detection of outlying values) followed by standardization of the data (i.e., making the data uniformly comparable throughout the dataset). This is followed by the formal quantitative analysis of the data, which will involve either statistical hypothesis testing or multivariate pattern recognition. Statistical hypothesis testing is the usual approach to "class comparison," where several experimental groups are being directly compared. The best approach to this problem is to use analysis of variance, although issues related to multiple hypothesis testing and probability estimation still need to be evaluated. Pattern recognition can involve "class prediction," for which a range of supervised multivariate techniques are available, or "class discovery," for which an even broader range of unsupervised multivariate techniques have been developed. Each technique has its own limitations, which need to be kept in mind when making a choice from among them. To put these ideas in context, we provide a detailed examination of two specific examples of the analysis of microarray data, both from parasitology, covering many of the most important points raised.  相似文献   

9.
Multiple bioinformatic methods are available to analyse the information encoded within the complete genome sequence of a bacterium and accurately assign its species status or nearest phylogenetic neighbour. However, it is clear that even now in what is the third decade of bacterial genomics, taxonomically incorrect genome sequence depositions are still being made. We outline a simple scheme of bioinformatic analysis and a set of minimum criteria that should be applied to all bacterial genomic data to ensure that they are accurately assigned to the species or genus level prior to database deposition. To illustrate the utility of the bioinformatic workflow, we analysed the recently deposited genome sequence of Lactobacillus acidophilus 30SC and demonstrated that this DNA was in fact derived from a strain of Lactobacillus amylovorus. Using these methods researchers can ensure that the taxonomic accuracy of genome sequence depositions is maintained within the ever increasing nucleic acid datasets.  相似文献   

10.
Prevailing 16S rRNA gene-amplicon methods for characterizing the bacterial microbiome of wildlife are economical, but result in coarse taxonomic classifications, are subject to primer and 16S copy number biases, and do not allow for direct estimation of microbiome functional potential. While deep shotgun metagenomic sequencing can overcome many of these limitations, it is prohibitively expensive for large sample sets. Here we evaluated the ability of shallow shotgun metagenomic sequencing to characterize taxonomic and functional patterns in the faecal microbiome of a model population of feral horses (Sable Island, Canada). Since 2007, this unmanaged population has been the subject of an individual-based, long-term ecological study. Using deep shotgun metagenomic sequencing, we determined the sequencing depth required to accurately characterize the horse microbiome. In comparing conventional vs. high-throughput shotgun metagenomic library preparation techniques, we validate the use of more cost-effective laboratory methods. Finally, we characterize similarities between 16S amplicon and shallow shotgun characterization of the microbiome, and demonstrate that the latter recapitulates biological patterns first described in a published amplicon data set. Unlike for amplicon data, we further demonstrate how shallow shotgun metagenomic data provide useful insights regarding microbiome functional potential which support previously hypothesized diet effects in this study system.  相似文献   

11.
Recent advances in high‐throughput methods of molecular analyses have led to an explosion of studies generating large‐scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in‐depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high‐throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure.  相似文献   

12.
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.  相似文献   

13.

Background

The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.

Results

Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.

Conclusions

The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.  相似文献   

14.
We present what we believe to be a novel statistical contact potential based on solved structures of transmembrane (TM) α-helical bundles, and we use this contact potential to investigate the amino acid likelihood of stabilizing helix-helix interfaces. To increase statistical significance, we have reduced the full contact energy matrix to a four-flavor alphabet of amino acids, automatically determined by our methodology, in which we find that polarity is a more dominant factor of group identity than is size, with charged or polar groups most often occupying the same face, whereas polar/apolar residue pairs tend to occupy opposite faces. We found that the most polar residues strongly influence interhelical contact formation, although they occur rarely in TM helical bundles. Two-body contact energies in the reduced letter code are capable of determining native structure from a large decoy set for a majority of test TM proteins, at the same time illustrating that certain higher-order sequence correlations are necessary for more accurate structure predictions.  相似文献   

15.
Biostatistical methods have become thoroughly integrated into modern biomedical and clinical research. Nevertheless, every observer who has evaluated articles in medical journals has noted that as many as half the reported results were based on questionable statistical analysis. This situation, combined with the fact that most errors involve relatively simple statistical procedures, points to the need for researchers and practitioners to be able to personally judge the quality of the statistical analyses in what they read. Fortunately, there are several excellent papers and texts available for those interested.  相似文献   

16.
Critical comparison of consensus methods for molecular sequences.   总被引:6,自引:0,他引:6       下载免费PDF全文
Consensus methods are recognized as valuable tools for data analysis, especially when some sort of data aggregation is desired. Although consensus methods for sequences play a vital role in molecular biology, researchers pay little heed to the features and limitations of such methods, and so there are risks that criteria for constructing consensus sequences will be misused or misunderstood. To understand better the issues involved, we conducted a critical comparison of nine consensus methods for sequences, of which eight were used in papers appearing in this journal. We report the results of that comparison, and we make recommendations which we hope will assist researchers when they must select particular consensus methods for particular applications.  相似文献   

17.
18.
Pathway analysis using random forests classification and regression   总被引:3,自引:0,他引:3  
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.  相似文献   

19.
Statistics plays a crucial role in research, planning and decision-making in the health sciences. Progress in technologies and continued research in computational statistics has enabled us to implement sophisticated mathematical models within software that are handled by non-statistician researchers. As a result, over the last decades, medical journals have published a host of papers that use some novel statistical method. The aim of this paper is to present a review on how the statistical methods are being applied in the construction of scientific knowledge in health sciences, as well as, to propose some improvement actions. From the early twentieth century, there has been a remarkable surge in scientific evidence alerting on the errors that many non-statistician researchers were making in applying statistical methods. Today, several studies continue showing that a large percentage of articles published in high-impact factor journals contain errors in data analysis or interpretation of results, with the ensuing repercussions on the validity and efficiency of the research conducted. Scientific community should reflect on the causes that have led to this situation, the consequences to the advancement of scientific knowledge and the solutions to this problem.  相似文献   

20.
In this paper, we discuss the potential for the use of engineering methods that were originally developed for the design of embedded computer systems, to analyse biological cell systems. For embedded systems as well as for biological cell systems, design is a feature that defines their identity. The assembly of different components in designs of both systems can vary widely. In contrast to the biology domain, the computer engineering domain has the opportunity to quickly evaluate design options and consequences of its systems by methods for computer aided design and in particular design space exploration. We argue that there are enough concrete similarities between the two systems to assume that the engineering methodology from the computer systems domain, and in particular that related to embedded systems, can be applied to the domain of cellular systems. This will help to understand the myriad of different design options cellular systems have. First we compare computer systems with cellular systems. Then, we discuss exactly what features of engineering methods could aid researchers with the analysis of cellular systems, and what benefits could be gained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号