首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The NGS (next generation sequencing)-based metagenomic data analysis is becoming the mainstream for the study of microbial communities. Faced with a large amount of data in metagenomic research, effective data visualization is important for scientists to effectively explore, interpret and manipulate such rich information. The visualization of the metagenomic data, especially multi-sample data, is one of the most critical challenges. The different data sample sources, sequencing approaches and heterogeneous data formats make robust and seamless data visualization difficult. Moreover, researchers have different focuses on metagenomic studies: taxonomical or functional, sample-centric or genome-centric, single sample or multiple samples, etc. However, current efforts in metagenomic data visualization cannot fulfill all of these needs, and it is extremely hard to organize all of these visualization effects in a systematic manner. An extendable, interactive visualization tool would be the method of choice to fulfill all of these visualization needs. In this paper, we have present MetaSee, an extendable toolbox that facilitates the interactive visualization of metagenomic samples of interests. The main components of MetaSee include: (I) a core visualization engine that is composed of different views for comparison of multiple samples: Global view, Phylogenetic view, Sample view and Taxa view, as well as link-out for more in-depth analysis; (II) front-end user interface with real metagenomic models that connect to the above core visualization engine and (III) open-source portal for the development of plug-ins for MetaSee. This integrative visualization tool not only provides the visualization effects, but also enables researchers to perform in-depth analysis of the metagenomic samples of interests. Moreover, its open-source portal allows for the design of plug-ins for MetaSee, which would facilitate the development of any additional visualization effects.  相似文献   

2.
To integrate heterogeneous and large omics data constitutes not only a conceptual challenge but a practical hurdle in the daily analysis of omics data. With the rise of novel omics technologies and through large-scale consortia projects, biological systems are being further investigated at an unprecedented scale generating heterogeneous and often large data sets. These data-sets encourage researchers to develop novel data integration methodologies. In this introduction we review the definition and characterize current efforts on data integration in the life sciences. We have used a web-survey to assess current research projects on data-integration to tap into the views, needs and challenges as currently perceived by parts of the research community.  相似文献   

3.
Areas of life sciences research that were previously distant from each other in ideology, analysis practices and toolkits, such as microbial ecology and personalized medicine, have all embraced techniques that rely on next-generation sequencing instruments. Yet the capacity to generate the data greatly outpaces our ability to analyse it. Existing sequencing technologies are more mature and accessible than the methodologies that are available for individual researchers to move, store, analyse and present data in a fashion that is transparent and reproducible. Here we discuss currently pressing issues with analysis, interpretation, reproducibility and accessibility of these data, and we present promising solutions and venture into potential future developments.  相似文献   

4.
The human evolutionary sciences place high value on quantitative data from traditional small‐scale societies that are rapidly modernizing. These data often stem from the sustained ethnographic work of anthropologists who are today nearing the end of their careers. Yet many quantitative ethnographic data are preserved only in summary formats that do not reflect the rich and variable ethnographic reality often described in unpublished field notes, nor the deep knowledge of their collectors. In raw disaggregated formats, such data have tremendous scientific value when used in conjunction with modern statistical techniques and as part of comparative analyses. Through a personal example of longitudinal research with Batek hunter‐gatherers that involved collaboration across generations of researchers, we argue that quantifiable ethnographic records, just like material artifacts, deserve high‐priority preservation efforts. We discuss the benefits, challenges, and possible avenues forward for digitizing, preserving, and archiving ethnographic data before it is too late.  相似文献   

5.
6.
Photon imaging is a new technique for the quantitative analysis of bioluminescence and chemiluminescence and can be performed both at the macro and micro levels. The high sensitivity and spatial resolution of photon-counting cameras have resulted in the development of new applications in the life sciences. At the macro level, imaging is a valuable tool for the rapid identification of biological samples emitting long-lasting glows in assays using microtitre plates or filter formats (immunoassays, DNA probes, phagocytosis, gene expression, metabolite and drug analysis) and also for in vivo studies of promoter activity. At the micro level, low-light imaging can be used for analysing multiple analytes on micro sensors and for advanced cell analysis (immunocytology, in situ hybridization, identification of cells or tissues expressing the luciferase gene, intracellular or intercellular protein traffic, metabolite analysis and imaging of Ca2+ flux and phosphorylation reactions). Two-dimensional photon-counting instrumentation is a versatile and powerful research tool for imaging and is complementary to conventional luminometers. The main applications to the life sciences involve many types of luminescence assays and can be performed on multiple samples in standard and non-standard formats. Photon-counting coupled to imaging is very helpful in selecting microorganisms or cells expressing bioluminescent genes. Measurements can be made in vitro and in vivo with a sensitivity comparable to that of phototube luminometers.  相似文献   

7.

Background

The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.

Results

Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.

Conclusions

The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.  相似文献   

8.
The ability to retrieve relevant information is at the heart of every aspect of research and development in the life sciences industry. Information is often distributed across multiple systems and recorded in a way that makes it difficult to piece together the complete picture. Differences in data formats, naming schemes and network protocols amongst information sources, both public and private, must be overcome, and user interfaces not only need to be able to tap into these diverse information sources but must also assist users in filtering out extraneous information and highlighting the key relationships hidden within an aggregated set of information. The Semantic Web community has made great strides in proposing solutions to these problems, and many efforts are underway to apply Semantic Web techniques to the problem of information retrieval in the life sciences space. This article gives an overview of the principles underlying a Semantic Web-enabled information retrieval system: creating a unified abstraction for knowledge using the RDF semantic network model; designing semantic lenses that extract contextually relevant subsets of information; and assembling semantic lenses into powerful information displays. Furthermore, concrete examples of how these principles can be applied to life science problems including a scenario involving a drug discovery dashboard prototype called BioDash are provided.  相似文献   

9.
Interest in incorporating life history research from evolutionary biology into the human sciences has grown rapidly in recent years. Two core features of this research have the potential to prove valuable in strengthening theoretical frameworks in the health and social sciences: the idea that there is a fundamental trade-off between reproduction and health; and that environmental influences are important in determining how life histories develop. However, the literature on human life histories has increasingly travelled away from its origins in biology, and become conceptually diverse. For example, there are differences of opinion between evolutionary researchers about the extent to which behavioural traits associate with life history traits to form ‘life history strategies’. Here, I review the different approaches to human life histories from evolutionary anthropologists, developmental psychologists and personality psychologists, in order to assess the evidence for human ‘life history strategies’. While there is precedent in biology for the argument that some behavioural traits, notably risk-taking behaviour, may be linked in predictable ways with life history traits, there is little theoretical or empirical justification for including a very wide range of behavioural traits in a ‘life history strategy’. Given the potential of life history approaches to provide a powerful theoretical framework for understanding human health and behaviour, I then recommend productive ways forward for the field: 1) greater focus on the life history trade-offs which underlie proposed strategies; 2) greater precision when using the language of life history theory and life history strategies; 3) collecting more empirical data, from a diverse range of populations, on linkages between life history traits, behavioural traits and the environment, including the underlying mechanisms which generate these linkages; and 4) greater integration with the social and health sciences.  相似文献   

10.

Background

In contrast to Newton''s well-known aphorism that he had been able “to see further only by standing on the shoulders of giants,” one attributes to the Spanish philosopher Ortega y Gasset the hypothesis saying that top-level research cannot be successful without a mass of medium researchers on which the top rests comparable to an iceberg.

Methodology/Principal Findings

The Ortega hypothesis predicts that highly-cited papers and medium-cited (or lowly-cited) papers would equally refer to papers with a medium impact. The Newton hypothesis would be supported if the top-level research more frequently cites previously highly-cited work than that medium-level research cites highly-cited work. Our analysis is based on (i) all articles and proceedings papers which were published in 2003 in the life sciences, health sciences, physical sciences, and social sciences, and (ii) all articles and proceeding papers which were cited within these publications. The results show that highly-cited work in all scientific fields more frequently cites previously highly-cited papers than that medium-cited work cites highly-cited work.

Conclusions/Significance

We demonstrate that papers contributing to the scientific progress in a field lean to a larger extent on previously important contributions than papers contributing little. These findings support the Newton hypothesis and call into question the Ortega hypothesis (given our usage of citation counts as a proxy for impact).  相似文献   

11.
Metric data are usually assessed on a continuous scale with good precision, but sometimes agricultural researchers cannot obtain precise measurements of a variable. Values of such a variable cannot then be expressed as real numbers (e.g., 1.51 or 2.56), but often can be represented by intervals into which the values fall (e.g., from 1 to 2 or from 2 to 3). In this situation, statisticians talk about censoring and censored data, as opposed to missing data, where no information is available at all. Traditionally, in agriculture and biology, three methods have been used to analyse such data: (a) when intervals are narrow, some form of imputation (e.g., mid‐point imputation) is used to replace the interval and traditional methods for continuous data are employed (such as analyses of variance [ANOVA] and regression); (b) for time‐to‐event data, the cumulative proportions of individuals that experienced the event of interest are analysed, instead of the individual observed times‐to‐event; (c) when intervals are wide and many individuals are collected, non‐parametric methods of data analysis are favoured, where counts are considered instead of the individual observed value for each sample element. In this paper, we show that these methods may be suboptimal: The first one does not respect the process of data collection, the second leads to unreliable standard errors (SEs), while the third does not make full use of all the available information. As an alternative, methods of survival analysis for censored data can be useful, leading to reliable inferences and sound hypotheses testing. These methods are illustrated using three examples from plant and crop sciences.  相似文献   

12.
The hypothesis of a Hierarchy of the Sciences with physical sciences at the top, social sciences at the bottom, and biological sciences in-between is nearly 200 years old. This order is intuitive and reflected in many features of academic life, but whether it reflects the “hardness” of scientific research—i.e., the extent to which research questions and results are determined by data and theories as opposed to non-cognitive factors—is controversial. This study analysed 2434 papers published in all disciplines and that declared to have tested a hypothesis. It was determined how many papers reported a “positive” (full or partial) or “negative” support for the tested hypothesis. If the hierarchy hypothesis is correct, then researchers in “softer” sciences should have fewer constraints to their conscious and unconscious biases, and therefore report more positive outcomes. Results confirmed the predictions at all levels considered: discipline, domain and methodology broadly defined. Controlling for observed differences between pure and applied disciplines, and between papers testing one or several hypotheses, the odds of reporting a positive result were around 5 times higher among papers in the disciplines of Psychology and Psychiatry and Economics and Business compared to Space Science, 2.3 times higher in the domain of social sciences compared to the physical sciences, and 3.4 times higher in studies applying behavioural and social methodologies on people compared to physical and chemical studies on non-biological material. In all comparisons, biological studies had intermediate values. These results suggest that the nature of hypotheses tested and the logical and methodological rigour employed to test them vary systematically across disciplines and fields, depending on the complexity of the subject matter and possibly other factors (e.g., a field''s level of historical and/or intellectual development). On the other hand, these results support the scientific status of the social sciences against claims that they are completely subjective, by showing that, when they adopt a scientific approach to discovery, they differ from the natural sciences only by a matter of degree.  相似文献   

13.
Because of the complexity of plant responses to water deficit, researchers have attempted to identify simplified models to understand critical aspects of the problem by searching for single indicators that would enable evaluations of the effects of environmental changes on the entire plant. However, this reductionist approach, which is often used in plant sciences, makes it difficult to distinguish systemic emergent behaviours. Currently, a new class of models and epistemology have called attention to the fundamental properties of complex systems. These properties, termed ‘emergent’, are observed at a large scale of the system (top hierarchical level) but cannot be observed or inferred from smaller scales of observation in the same system. We propose that multivariate statistical analysis can provide a suitable tool to quantify global responses to water deficit, allowing a specific and partially quantitative assessment of emergent properties. Based on an experimental study, our results showed that the classical approach of the individual analysis of different data sets might provide different interpretations for the observed effects of water deficit. These results support the hypothesis that a cross‐scale multivariate analysis is an appropriate method to establish models for systemic understanding of the interactions between plants and their changing environment.  相似文献   

14.

Background  

Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata.  相似文献   

15.
Twenty-first century life sciences have transformed into data-enabled (also called data-intensive, data-driven, or big data) sciences. They principally depend on data-, computation-, and instrumentation-intensive approaches to seek comprehensive understanding of complex biological processes and systems (e.g., ecosystems, complex diseases, environmental, and health challenges). Federal agencies including the National Science Foundation (NSF) have played and continue to play an exceptional leadership role by innovatively addressing the challenges of data-enabled life sciences. Yet even more is required not only to keep up with the current developments, but also to pro-actively enable future research needs. Straightforward access to data, computing, and analysis resources will enable true democratization of research competitions; thus investigators will compete based on the merits and broader impact of their ideas and approaches rather than on the scale of their institutional resources. This is the Final Report for Data-Intensive Science Workshops DISW1 and DISW2. The first NSF-funded Data Intensive Science Workshop (DISW1, Seattle, WA, September 19-20, 2010) overviewed the status of the data-enabled life sciences and identified their challenges and opportunities. This served as a baseline for the second NSF-funded DIS workshop (DISW2, Washington, DC, May 16-17, 2011). Based on the findings of DISW2 the following overarching recommendation to the NSF was proposed: establish a community alliance to be the voice and framework of the data-enabled life sciences. After this Final Report was finished, Data-Enabled Life Sciences Alliance (DELSA, www.delsall.org ) was formed to become a Digital Commons for the life sciences community.  相似文献   

16.
Abstract New methods for performing quantitative proteome analyses based on differential labeling protocols or label-free techniques are reported in the literature on an almost monthly basis. In parallel, a correspondingly vast number of software tools for the analysis of quantitative proteomics data has also been described in the literature and produced by private companies. In this article we focus on the review of some of the most popular techniques in the field and present a critical appraisal of several software packages available to process and analyze the data produced. We also describe the importance of community standards to support the wide range of software, which may assist researchers in the analysis of data using different platforms and protocols. It is intended that this review will serve bench scientists both as a useful reference and a guide to the selection and use of different pipelines to perform quantitative proteomics data analysis. We have produced a web-based tool ( http://www.proteosuite.org/?q=other_resources ) to help researchers find appropriate software for their local instrumentation, available file formats, and quantitative methodology.  相似文献   

17.
Lam CH  Tsontos VM 《PloS one》2011,6(7):e21810
Electronic tags have been used widely for more than a decade in studies of diverse marine species. However, despite significant investment in tagging programs and hardware, data management aspects have received insufficient attention, leaving researchers without a comprehensive toolset to manage their data easily. The growing volume of these data holdings, the large diversity of tag types and data formats, and the general lack of data management resources are not only complicating integration and synthesis of electronic tagging data in support of resource management applications but potentially threatening the integrity and longer-term access to these valuable datasets. To address this critical gap, Tagbase has been developed as a well-rounded, yet accessible data management solution for electronic tagging applications. It is based on a unified relational model that accommodates a suite of manufacturer tag data formats in addition to deployment metadata and reprocessed geopositions. Tagbase includes an integrated set of tools for importing tag datasets into the system effortlessly, and provides reporting utilities to interactively view standard outputs in graphical and tabular form. Data from the system can also be easily exported or dynamically coupled to GIS and other analysis packages. Tagbase is scalable and has been ported to a range of database management systems to support the needs of the tagging community, from individual investigators to large scale tagging programs. Tagbase represents a mature initiative with users at several institutions involved in marine electronic tagging research.  相似文献   

18.
Do different fields of knowledge require different research strategies? A numerical model exploring different virtual knowledge landscapes, revealed two diverging optimal search strategies. Trend following is maximized when the popularity of new discoveries determine the number of individuals researching it. This strategy works best when many researchers explore few large areas of knowledge. In contrast, individuals or small groups of researchers are better in discovering small bits of information in dispersed knowledge landscapes. Bibliometric data of scientific publications showed a continuous bipolar distribution of these strategies, ranging from natural sciences, with highly cited publications in journals containing a large number of articles, to the social sciences, with rarely cited publications in many journals containing a small number of articles. The natural sciences seem to adapt their research strategies to landscapes with large concentrated knowledge clusters, whereas social sciences seem to have adapted to search in landscapes with many small isolated knowledge clusters. Similar bipolar distributions were obtained when comparing levels of insularity estimated by indicators of international collaboration and levels of country-self citations: researchers in academic areas with many journals such as social sciences, arts and humanities, were the most isolated, and that was true in different regions of the world. The work shows that quantitative measures estimating differences between academic disciplines improve our understanding of different research strategies, eventually helping interdisciplinary research and may be also help improve science policies worldwide.  相似文献   

19.
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all.  相似文献   

20.
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R.We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.Mass spectrometry (MS)1 has become a major analysis tool in the life sciences (1). It is currently used in different modes for several “omics” approaches, proteomics and metabolomics being the most prominent. In both disciplines, one major burden in the exchange, communication, and large-scale (re-) analysis of MS-based data is the significant number of software pipelines and, consequently, heterogeneous file formats used to process, analyze, and store these experimental results, including both identification and quantification data. Publication guidelines from scientific journals and funding agencies'' requirements for public data availability have led to an increasing amount of MS-based proteomics and metabolomics data being submitted to public repositories, such as those of the ProteomeXchange consortium (2) or, in the case of metabolomics, the resources from the nascent COSMOS (Coordination of Standards in Metabolomics) initiative (3).In the past few years, the Human Proteome Organization Proteomics Standards Initiative (PSI) has developed several vendor-neutral standard data formats to overcome the representation heterogeneity. The Human Proteome Organization PSI promotes the usage of three XML file formats to fully report the data coming from MS-based proteomics experiments (including related metadata): mzML (4) to store the “primary” MS data (the spectra and chromatograms), mzIdentML (5) to report peptide identifications and inferred protein identifications, and mzQuantML (6) to store quantitative information associated with these results.Even though the existence of the PSI standard data formats represents a huge step forward, these formats cannot address all use cases related to proteomics and metabolomics data exchange and sharing equally well. During the development of mzML, mzIdentML, and mzQuantML, the main focus lay on providing an exact and comprehensive representation of the gathered results. All three formats can be used within analysis pipelines and as interchange formats between independent analysis tools. It is thus vital that these formats be capable of storing the full data and analysis that led to the results. Therefore, all three formats result in relatively complex schemas, a clear necessity for adequate representation of the complexity found in MS-based data.An inevitable drawback of this approach is that data consumers can find it difficult to quickly retrieve the required information. Several application programming interfaces (APIs) have been developed to simplify software development based on these formats (79), but profound proteomics and bioinformatics knowledge still is required in order to use them efficiently and take full advantage of the comprehensive information contained.The new file format presented here, mzTab, aims to describe the qualitative and quantitative results for MS-based proteomics and metabolomics experiments in a consistent, simpler tabular format, abstracting from the mass spectrometry details. The format contains identifications, basic quantitative information, and related metadata. With mzTab''s flexible design, it is possible to report results at different levels ranging from a simple summary or subset of the complete information (e.g. the final results) to fairly comprehensive representation of the results including the experimental design. Many downstream analysis use cases are only concerned with the final results of an experiment in an easily accessible format that is compatible with tools such as Microsoft Excel® or R (10) and can easily be adapted by existing bioinformatics tools. Therefore, mzTab is ideally suited to make MS proteomics and metabolomics results available to the wider biological community, beyond the field of MS.mzTab follows a similar philosophy as the other tab-delimited format recently developed by the PSI to represent molecular interaction data, MITAB (11). MITAB is a simpler tab-delimited format, whereas PSI-MI XML (12), the more detailed XML-based format, holds the complete evidence. The microarray community makes wide use of the format MAGE-TAB (13), another example of such a solution that can cover the main use cases and, for the sake of simplicity, is often preferred to the XML standard format MAGE-ML (14). Additionally, in MS-based proteomics, several software packages, such as Mascot (15), OMSSA (16), MaxQuant (17), OpenMS/TOPP (18, 19), and SpectraST (20), also support the export of their results in a tab-delimited format next to a more complete and complex default format. These simple formats do not contain the complete information but are nevertheless sufficient for the most frequent use cases.mzTab has been designed with the same purpose in mind. It can be used alone or in conjunction with mzML (or other related MS data formats such as mzXML (21) or text-based peak list formats such as MGF), mzIdentML, and/or mzQuantML. Several highly successful concepts taken from the development process of mzIdentML and mzQuantML were adapted to the text-based nature of mzTab.In addition, there is a trend to perform more integrated experimental workflows involving both proteomics and metabolomics data. Thus, we developed a standard format that can represent both types of information in a single file.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号