首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.

Results

Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.

Conclusions

The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.  相似文献   

2.

Background  

Systems Biology Markup Language (SBML) is gaining broad usage as a standard for representing dynamical systems as data structures. The open source statistical programming environment R is widely used by biostatisticians involved in microarray analyses. An interface between SBML and R does not exist, though one might be useful to R users interested in SBML, and SBML users interested in R.  相似文献   

3.

Background

For years, emerging infectious diseases have appeared worldwide and threatened the health of people. The emergence and spread of an infectious-disease outbreak are usually unforeseen, and have the features of suddenness and uncertainty. Timely understanding of basic information in the field, and the collection and analysis of epidemiological information, is helpful in making rapid decisions and responding to an infectious-disease emergency. Therefore, it is necessary to have an unobstructed channel and convenient tool for the collection and analysis of epidemiologic information in the field.

Methodology/Principal Findings

Baseline information for each county in mainland China was collected and a database was established by geo-coding information on a digital map of county boundaries throughout the country. Google Maps was used to display geographic information and to conduct calculations related to maps, and the 3G wireless network was used to transmit information collected in the field to the server. This study established a decision support system for the response to infectious-disease emergencies based on WebGIS and mobile services (DSSRIDE). The DSSRIDE provides functions including data collection, communication and analyses in real time, epidemiological detection, the provision of customized epidemiological questionnaires and guides for handling infectious disease emergencies, and the querying of professional knowledge in the field. These functions of the DSSRIDE could be helpful for epidemiological investigations in the field and the handling of infectious-disease emergencies.

Conclusions/Significance

The DSSRIDE provides a geographic information platform based on the Google Maps application programming interface to display information of infectious disease emergencies, and transfers information between workers in the field and decision makers through wireless transmission based on personal computers, mobile phones and personal digital assistants. After a 2-year practice and application in infectious disease emergencies, the DSSRIDE is becoming a useful platform and is a useful tool for investigations in the field carried out by response sections and individuals. The system is suitable for use in developing countries and low-income districts.  相似文献   

4.

Introduction

In metabolomics studies, unwanted variation inevitably arises from various sources. Normalization, that is the removal of unwanted variation, is an essential step in the statistical analysis of metabolomics data. However, metabolomics normalization is often considered an imprecise science due to the diverse sources of variation and the availability of a number of alternative strategies that may be implemented.

Objectives

We highlight the need for comparative evaluation of different normalization methods and present software strategies to help ease this task for both data-oriented and biological researchers.

Methods

We present NormalizeMets—a joint graphical user interface within the familiar Microsoft Excel and freely-available R software for comparative evaluation of different normalization methods. The NormalizeMets R package along with the vignette describing the workflow can be downloaded from https://cran.r-project.org/web/packages/NormalizeMets/. The Excel Interface and the Excel user guide are available on https://metabolomicstats.github.io/ExNormalizeMets.

Results

NormalizeMets allows for comparative evaluation of normalization methods using criteria that depend on the given dataset and the ultimate research question. Hence it guides researchers to assess, select and implement a suitable normalization method using either the familiar Microsoft Excel and/or freely-available R software. In addition, the package can be used for visualisation of metabolomics data using interactive graphical displays and to obtain end statistical results for clustering, classification, biomarker identification adjusting for confounding variables, and correlation analysis.

Conclusion

NormalizeMets is designed for comparative evaluation of normalization methods, and can also be used to obtain end statistical results. The use of freely-available R software offers an attractive proposition for programming-oriented researchers, and the Excel interface offers a familiar alternative to most biological researchers. The package handles the data locally in the user’s own computer allowing for reproducible code to be stored locally.
  相似文献   

5.

Background  

There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility.  相似文献   

6.

Background  

Analysis of the plethora of metabolites found in the NMR spectra of biological fluids or tissues requires data complexity to be simplified. We present a graphical user interface (GUI) for NMR-based metabonomic analysis. The "Metabonomic Package" has been developed for metabonomics research as open-source software and uses the R statistical libraries.  相似文献   

7.
pROC: an open-source package for R and S+ to analyze and compare ROC curves   总被引:3,自引:0,他引:3  

Background  

Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface.  相似文献   

8.

Background  

Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses.  相似文献   

9.

Background  

Biological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses.  相似文献   

10.
Characterizing genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata are not always easily integrated into these methods in a user‐friendly fashion. Here, we present a deployable Python‐based web‐tool, mvmapper , for visualizing and exploring results of multivariate analyses in geographic space. This tool can be used to map results of virtually any multivariate analysis of georeferenced data, and routines for exporting results from a number of standard methods have been integrated in the R package adegenet , including principal components analysis (PCA), spatial PCA, discriminant analysis of principal components, principal coordinates analysis, nonmetric dimensional scaling and correspondence analysis. mvmapper 's greatest strength is facilitating dynamic and interactive exploration of the statistical and geographic frameworks side by side, a task that is difficult and time‐consuming with currently available tools. Source code and deployment instructions, as well as a link to a hosted instance of mvmapper , can be found at https://popphylotools.github.io/mvMapper/ .  相似文献   

11.

Background  

Current tools for Co-phylogenetic analyses are not able to cope with the continuous accumulation of phylogenetic data. The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times. The Parafit and DistPCoA programs are the by far most compute-intensive components of the Parafit analysis pipeline. We present AxParafit and AxPcoords (Ax stands for Accelerated) which are highly optimized versions of Parafit and DistPCoA respectively.  相似文献   

12.

Background

Age at menarche is considered a reliable prognostic factor for idiopathic scoliosis and varies in different geographic latitudes. Adolescent idiopathic scoliosis prevalence has also been reported to be different in various latitudes and demonstrates higher values in northern countries. A study on epidemiological reports from the literature was conducted to investigate a possible association between prevalence of adolescent idiopathic scoliosis and age at menarche among normal girls in various geographic latitudes. An attempt is also made to implicate a possible role of melatonin in the above association.

Material-methods

20 peer-reviewed published papers reporting adolescent idiopathic scoliosis prevalence and 33 peer-reviewed papers reporting age at menarche in normal girls from most geographic areas of the northern hemisphere were retrieved from the literature. The geographic latitude of each centre where a particular study was originated was documented. The statistical analysis included regression of the adolescent idiopathic scoliosis prevalence and age at menarche by latitude.

Results

The regression of prevalence of adolescent idiopathic scoliosis and age at menarche by latitude is statistically significant (p < 0.001) and are following a parallel declining course of their regression curves, especially in latitudes northern than 25 degrees.

Conclusion

Late age at menarche is parallel with higher prevalence of adolescent idiopathic scoliosis. Pubarche appears later in girls that live in northern latitudes and thus prolongs the period of spine vulnerability while other pre-existing or aetiological factors are contributing to the development of adolescent idiopathic scoliosis. A possible role of geography in the pathogenesis of idiopathic scoliosis is discussed, as it appears that latitude which differentiates the sunlight influences melatonin secretion and modifies age at menarche, which is associated to the prevalence of idiopathic scoliosis.  相似文献   

13.

Background  

R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians.  相似文献   

14.
15.

Background  

With the amount of influenza genome sequence data growing rapidly, researchers need machine assistance in selecting datasets and exploring the data. Enhanced visualization tools are required to represent results of the exploratory analysis on the web in an easy-to-comprehend form and to facilitate convenient information retrieval.  相似文献   

16.

Background  

SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies.  相似文献   

17.
18.

Background

Interviewer-administered surveys are an important method of collecting population-level epidemiological data, but suffer from declining response rates and increasing costs. Web surveys offer more rapid data collection and lower costs. There are concerns, however, about data quality from web surveys. Previous research has largely focused on selection biases, and few have explored measurement differences. This paper aims to assess the extent to which mode affects the responses given by the same respondents at two points in time, providing information on potential measurement error if web surveys are used in the future.

Methods

527 participants from the third British National Survey of Sexual Attitudes and Lifestyles (Natsal-3), which uses computer assisted personal interview (CAPI) and self-interview (CASI) modes, subsequently responded to identically-worded questions in a web survey. McNemar tests assessed whether within-person differences in responses were at random or indicated a mode effect, i.e. higher reporting of more sensitive responses in one mode. An analysis of pooled responses by generalized estimating equations addressed the impact of gender and question type on change.

Results

Only 10% of responses changed between surveys. However mode effects were found for about a third of variables, with higher reporting of sensitive responses more commonly found on the web compared with Natsal-3.

Conclusions

The web appears a promising mode for surveys of sensitive behaviours, most likely as part of a mixed-mode design. Our findings suggest that mode effects may vary by question type and content, and by the particular mix of modes used. Mixed-mode surveys need careful development to understand mode effects and how to account for them.  相似文献   

19.

Background  

High-density microarray technology is increasingly applied to study gene expression levels on a large scale. Microarray experiments rely on several critical steps that may introduce error and uncertainty in analyses. These steps include mRNA sample extraction, amplification and labeling, hybridization, and scanning. In some cases this may be manifested as systematic spatial variation on the surface of microarray in which expression measurements within an individual array may vary as a function of geographic position on the array surface.  相似文献   

20.

Background

The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements.

Results

MLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility.

Conclusions

To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号